fastDNAml_1.2.2/004075500000410000013000000000000703414356000136515ustar00garyarchae00000400000020fastDNAml_1.2.2/docs/004075500000410000013000000000000703414356700146105ustar00garyarchae00000400000020fastDNAml_1.2.2/docs/fastDNAml_doc_1.2.txt010064400000410000013000001025460703414344600203700ustar00garyarchae00000400000020 fastDNAml 1.2 Gary J. Olsen, Department of Microbiology University of Illinois, Urbana, IL gary@phylo.life.uiuc.edu Ross Overbeek, Mathematics and Computer Science Argonne National Laboratory, Argonne, IL overbeek@mcs.anl.gov Citing fastDNAml If you publish work using fastDNAml, please cite the following publications: Olsen, G. J., Matsuda, H., Hagstrom, R., and Overbeek, R. 1994. fastDNAml: A tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput. Appl. Biosci. 10: 41-48. Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. 17: 368-376. What is fastDNAml fastDNAml is a program derived from Joseph Felsenstein's version 3.3 DNAML (part of his PHYLIP package). Users should consult the documentation for DNAML before using this program. fastDNAml is an attempt to solve the same problem as DNAML, but to do so faster and using less memory, so that larger trees and/or more bootstrap replicates become tractable. Much of fastDNAml is merely a recoding of the PHYLIP 3.3 DNAML program from PASCAL to C. DNAML includes the following notice: version 3.3. (c) Copyright 1986, 1990 by the University of Washington and Joseph Felsenstein. Written by Joseph Felsenstein. Permission is granted to copy and use this program provided no fee is charged for it and provided that this copyright notice is not removed. Why is fastDNAml faster? Some recomputation of values has been eliminated (Joe Felsenstein has done much of this in version 3.4 DNAML). The optimization of branch lengths has been accelerated by changing from an EM method to Newton's method (Joe Felsenstein has done much of this in version 3.4 DNAML). The strategy for simultaneously optimizing all of the branches on the tree has been modified to spend less time getting an individual branch right before improving the other branches. Other new features in fastDNAml fastDNAml includes a checkpoint feature to regularly save its progress toward finding a large tree. If the program is interrupted, a minor change to the input file and adding the R (restart) option permits the work to be resumed from the last checkpoint. The new R {restart) option can also be used for more rapid addition of new sequences to a previously computed tree (when new sequences are added to the alignment, it is best if the relative alignment of the previous sequences is not altered). The G (global) option has been generalized to permit crossing any number of branches during tree rearrangements. In addition, it is possible to modify the extent of rearrangement explored during the sequential addition phase of tree building. The G U (global and user tree) option combination instructs the program to find the best of the user trees, and then look for rearrangements that are better still. The number of available rate categories has been raised from 9 to 35. The weighting mask accepts values from 0 through 35. The new B (bootstrap) option causes generation of a bootstrap sample, drawn from the input data. The program includes "P4" code for distributing the problem over multiple processors (either within one machine, or across multiple machines). Do DNAML and fastDNAml give the same answer? Generally yes, though there are some reservations: One or the other might find a better tree due to minor changes in the ways trees are searched. When sequence addition is replicated with different values of the jumble random number seed, they have about the same probability of finding the best tree, but any given seed might give different trees. The likelihoods and branch lengths sometimes differ very slightly due to different criteria for stopping the optimization process. Little has been done to check the confidence limits on branch lengths. There seem to be some instances in which they disagree, and we think that fastDNAml is correct. However, do not take the "significantly greater than zero" too seriously. If you are concerned, you can supply a tree inferred by fastDNAml as a user tree to DNAML and let it (1) reoptimize branch lengths, (2) tell you the confidence limits and (3) tell you the tree likelihood. Changes and new features in version 1.2 The program can now calculate the likelihood of extremely large user trees. The largest tree we have tested had 3200 taxa. Generally, you will run out of computer memory before you excede an intrinsic limitation. (With this, it is possible to compare trees found by whatever your favorite methods are under the likelihood criterion.) The computation has been changed to permit ease of implimenting new models of evolution and analysis of amino acid sequences (though these have not yet been done). This has slowed down the program 5-10%. Changes and new features in version 1.1 The quickadd option is now the default. This has the ugly effect of reversing the meaning of putting a Q on the option line. (Sorry, about this, and the next note, but in the long run it it is the better behavior.) Use of empirical base frequencies is now the default. This reverses the meaning of the F option, making the default behavior more like that of PHYLIP. The tree output file is now generated by default and should be more compatible with the files written and read by the PHILIP programs. In particular, the comments with information about the tree, its likelihood, etc. are removed, and there are no quotation marks around names unless there are unusual characters within the name. (There are two things to be very careful about in names: there is no completely consistent way to handle both blanks and underscores in names without quotation marks, and when a name is spaced in from the margin in the input file, there are leading blank spaces in the name, which can be very hard to make compatible with some programs.) Maintaining a list of the several best trees, not just the (single) best. In particular, when evaluating user-supplied trees, the program tries to same information about all of the trees and provides a Hasegawa and Kashino type test of whether each tree is better than optimum. Note, the current version of the program prints the report in the order of tree likelihood, NOT in the order the trees are supplied to the program. The best way (at present) to figure out which tree is which is to look at the likelihoods. This is the same test used in PHILIP, but I had removed access in version 1.0 of fastDNAml due to differences in how the programs handle multiple trees. The difference is that fastDNAml can maintain nearly optimal trees all the time, so you can get a list of the N best trees found by using the new K option (below). The program should accept rooted trees (strictly bifurcating), as well as unrooted trees (with a trifurcation at the deepest level). This is not fully tested, but it seems to work. Features in the works Test subtree exchanges (as well as moving a single subtree) in the search for better trees. Allowing the program to optimize any user-defined subset of branches when user lengths are supplied. Input and Options Basics The input to fastDNAml is similar to that used by DNAML (and the other PHYLIP programs). The user should consult the PHYLIP documentation for a basic description of the format. This version of fastDNAml expects to get its input from stdin (standard input) and writes its output to stdout (standard output). (There are compile time options to modify this, for those who care to get into such things.) On a UNIX or DOS system, it is a simple matter to redirect input from a file and output to a file: fastDNAml < infile > outfile On a VMS system it is only slightly more difficult. Immediately before running the program, one includes two commands that define the input and output files: $ Define/User Sys$Input infile $ Define/User Sys$Output outfile $ Run fastDNAml The default input data format is Interleaved (see I option). To help get data from a GenBank or similar format, the interleaved option can be switched off with the I option. Numbers in the sequence data (i.e., sequence position numbers) will be ignored, so they need not be stripped out. (Note that the program also writes a file called checkpoint.PID. See the R option below for more description.) 1 -- Print Data By default, fastDNAml does not echo the sequence data to the output file. Option 1 reverses this. 3 -- Do Not Print Tree By default, fastDNAml prints the final tree to the output file. Option 3 reverses this. 4 -- Do Not Write Tree to File (***** Changed in version 1.1 *****) By default, fastDNAml versions 1.1 and 1.2 write a machine readable (Newick format) copy of the final tree to an output file. Option 4 reverses this. The tree output file will be called treefile.PID (where PID is the process ID under which fastDNAml is running). Look at the Y option below for more information on alternative tree formats. B -- Bootstrap Generates a bootstrap sample of the input data. Requires auxiliary data line of the form: B random_number_seed Example: 5 114 B B 137 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... If the W option is used, only positions that have nonzero weights are used in computing the bootstrap sample. Warning: For a given random number seed, the sample will always be the same. PHYLIP DNAML does not include a bootstrap option. (Use the SEQBOOT program.) C -- Categories Requires auxiliary data of the form: C number_of_categories list_of_category_rates The maximum number of categories is 35. This line is followed by a list of the rates for each site: Categories list_of_categories [per site, one or more lines] Category "numbers" are ordered: 1, 2, 3, ..., 9, A, B, ..., Y, Z. Category zero (undefined rate) is permitted at sites with a zero in a user-supplied weighting mask. Example: 5 114 C C 12 0.0625 0.125 0.25 0.5 1 2 4 8 16 32 64 128 Categories 5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9 633792246624457364222574877188898132984963499AA9899975 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... PHYLIP DNAML is limited to categories 1 through 9. Also, in PHYLIP version 3.3, the categories data came after all the other auxiliary data, but before the user-supplied base frequencies and sequence data. If you make the C line your last auxiliary data line, the programs will behave the same. F -- Empirical Frequencies (***** Changed in version 1.1 *****) By default (starting with version 1.1), the program uses base frequencies derived from the sequence data (called emperical base frequencies). Therefore the input file should normally NOT include a base frequencies line preceding the data. If you want to include your own base freqency data, it is now necessary to use the F option, and add a line to the input file that supplies the frequency data: Instructs the program to use user-supllied base frequencies derived from the sequence data. Therefore the input file should not include a base frequencies line IMMEDIATELY preceding the data: 5 114 F 0.25 0.30 0.20 0.25 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... There is an alternative format: the frequencies can be anywhere in the list of auxilliary data lines if they are preceded by an F in the first column: 5 114 F C W F 0.25 0.30 0.20 0.25 C ... ... W ... Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... G -- Global If the global option is specified, there may also be an [optional] auxiliary data line of form: G N1 or G N1 N2 N1 is the number of branches to cross in rearrangements of the completed tree. The value of N2 is the number of branches to cross in testing rearrangements during the sequential addition phase of tree inference. N1 = 1: local rearrangement (default without G option) 1 < N1 < numsp-3: regional rearrangements (crossing N1 branches) N1>= numsp-3: global rearrangements (default with G option) N2 <= N1 the default N2 is 1, local rearrangements. The G option can also be used to force branch swapping on user trees, that is, a combination of G and U options. If the auxiliary line is supplied, it cannot be the last line of auxiliary data. (It may be necessary to add the T option with an auxiliary data line of T 2.0 if no other auxiliary data are used.) Examples: Do local rearrangements after each addition, and global after last addition: 5 114 G Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... Do local rearrangements after each addition, and regional (crossing 4 branches) after last addition: 5 114 G T G 4 T 2.0 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... Do no rearrangements after each addition, and local after last addition: 5 114 G T G 1 0 T 2.0 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... PHYLIP DNAML does not support the auxiliary data line or branch swapping on a user tree. I -- Not Interleaved By default, fastDNAml 1.2 expects data lines for the various sequences in an interleaved format (as did PHYLIP 3.3 DNAML). The I option reverses the expected format (to non-interleaved data, in which all the data lines for one sequence before the next sequence begins). This is particularly useful for editing a GenBank or equivalent format into a valid input file (note that numbers within the sequence data are ignored, so it is not necessary to remove them). If all the data for each sequence are on one line, then the interleaved and non-interleaved formats are degenerate. (This is the way David Swofford's PAUP program writes PHYLIP format output files.) The drawback is that many programs do not handle long lines of text. This includes the vi and EDT text editors, many electronic mail programs, and some versions of FTP for VAX/VMS systems. PHYLIP 3.3 DNAML expects interleaved data, and does not include an I option to alter this. PHYLIP 3.4 DNAML accepts an I option, but the default format is reversed. J -- Jumble Randomize the sequence addition order. Requires an auxiliary input line of the form: J random_number_seed Example: 5 114 J J 137 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... Note that fastDNAml explores a very small number of alternative tree topologies relative to a typical parsimony program. There is a very real chance that the search procedure will not find the tree topology with the highest likelihood. Altering the order of taxon addition and comparing the trees found is a fairly efficient method for testing convergence. Typically, it would be nice to find the same best tree at least twice (if not three times), as opposed to simply performing some fixed number of jumbles and hoping that at least one of them will be the optimum. K -- Keep multiple best trees (***** New in version 1.1 *****) The program can keep a list of the best trees that it has found. When the program is done, it prints a list of these, from best to worst, and print a Hasegawa and Kishino type test as to which trees are significantly worse than the best tree found. When evaluating user-supplied trees, the program automatically keeps all trees. In other situations, the program keeps only the best tree that it has found. The K option, and associate auxilliary data line, can be used to define an alternative number: Example, to keep the 15 best trees found: 5 114 K K 15 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... Example, to keep only the one best tree of possibly numerous user-supplied trees: 5 114 K U K 1 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... L -- User Lengths Causes user trees to be read with branch lengths (and it is an error to omit any of them). Without the L option, branch lengths in user trees are not required, and are ignored if present. Example: 5 114 U L Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... (The U is for user tree and the L for user lengths) O -- Outgroup Use the specified sequence number for the outgroup. Requires an auxiliary data line of the form: O outgroup_number Example: 5 114 O O 5 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... This option only affects the way the tree is drawn (and written to the treefile). Q -- Quickadd (***** Changed in version 1.1 *****) The quickadd feature greatly decreases the time in initially placing a new sequence in the growing tree (but does not change the time required to subsequently test rearrangements). The overall time savings seems to be about 30%, based on a number of test cases. Its downside, if any, is unknown. This is now (starting in version 1.1) the default program behavior. If the analysis is run with a global option of "G 0 0", so that no rearrangements are permitted, the tree is build very approximately, but very quickly. This may be of greatest interest if the question is, "Where does this one new sequence fit into this known tree? The known tree is provided with the restart option (below). PHYLIP DNAML does not include anything comparable to the quickadd feature. The quickadd feature can be turned OFF by adding a Q to the first line of the input file. R -- Restart The R option causes the program to read a user-supplied tree with less than the full number of taxa as the starting point for sequential addition of the remaining taxa. Thus, the sequence data must be followed by a valid (Newick format) tree. (The phylip_tree/2, prolog fact format, is now also supported.) The restart option can also be used to increase the range of the search for alternative (better) trees. For example, you can take a tree produced with only "local" tree rearrangements, and increase the rearrangements to "regional" or "global" by combining the appropriate global option with the restart option. If the starting tree was written by fastDNAml, then the extent of rearrangements is saved with the tree, and will be used as the starting point for the additional search. If the tree was already globally optimized, then no additional searching will be performed. To support the R option, after each taxon is added to the growing tree, and after each round of rearrangements, the program appends a checkpoint tree to a file called checkpoint.PID, where PID is the process number of the running fastDNAml program. The last line of this file needs to be appended to the input file when the R option is used. (This should not be confused with the U (user tree) option, which expects a number followed by that number of trees. No additional taxa are added to user trees.) The UNIX utility tail can be used to remove the last tree from the checkpoint file, and the utility cat can be used to append it to the input. For example, the following script can be used to add a starting tree and the R option to a data file, and restart fastDNAml: #! /bin/sh if test $# -ne 1 then echo "Usage: restart checkpoint_file" exit fi read first_line # first line of data file echo "$first_line R" # add restart option cat - # rest of data file tail -1 $1 # append last tree in checkpoint file If this shell script is in the file called restart, then one might use the command: restart checkpoint.21312 < infile | fastDNAml > new_outfile ^script ^checkpoint tree ^data ^dnaml program ^output_file If this is too opaque, don't worry about it, or talk with your local unix wizard. In the mean time, this and other useful shell scripts are provided with the program. PHYLIP DNAML does not write checkpoint trees and does not have a restart option. T -- Transition/transversion ratio Use a user-specified ratio of transition to transversion type substitutions. Without the T option, a value of 2.0 is used. Requires an auxiliary data line of the form: T ratio Example: 5 114 T T 1.0 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... (Note that a T option with a value of 2.0 does nothing, but it can provide a last auxiliary data line following optional auxiliary data. See the examples for G and Y.) U -- User Tree(s) Read an input line with the number of user-specified trees, followed by the specified number of trees. These data immediately follow the sequence data. The trees must be in Newick format, and terminated with a semicolon. (The program also accepts a pseudo_newick format, which is a valid prolog fact.) The tree reader in this program is more powerful than that in PHYLIP 3.3. In particular, material enclosed in square brackets, [ like this ], is ignored as comments; taxa names can be wrapped in single quotation marks to support the inclusion of characters that would otherwise end the name (i.e., '(', ')', ':', ';', '[', ']', ',' and ' '); names of internal nodes are properly ignored; and exponential notation (such as 1.0E-6) for branch lengths is supported. W -- Weights Read user-specified column weighting information. This option requires auxiliary data of the form: Weights list_of_weight_values [per site, one or more lines] Example: 5 114 W Weights 111111111111001100000100011111100000000000000110000110000000 111101111111111111111111011100000111001011100000000011 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... It is necessary that the weight values not start before the 11'th character in the line, or some of them will be lost. Weights from 0 to 35 are indicated by the series: 0, 1, 2, 3, ..., 9, A, B, ..., Y, Z. PHYLIP DNAML does not support user weights with values other than 1 or 0. This limit has been removed in fastDNAml to permit the use of user weights as a mechanism for representing a bootstrap sample (that is, only the auxiliary data lines change, not the body of the data file). Y -- Write Tree (***** Changed in version 1.1 *****) fastDNAml writes the final tree to an output file called treefile.PID. By default the tree is in PHYLIP format. The Y option allows turning this off, or changing the format of the tree. The Y option by itself toggles the saving of the tree, on or off. If there is also an auxiliary input line of the form: Y number where number can be 1, 2, or 3, the number selects one of three tree output formats: 1 Newick 2 Prolog 3 PHYLIP (default) Newick is the tree standard used by PAUP, MacClade, and serveral other programs. The tree includes a comment about the analysis that the tree is based upon. fastDNAml uses this comment when it reads a tree. In addition, the names of the taxa are enclosed in quotation marks. Both of these features of the file make it incompatible with the PHYLIP package. PHYLIP is the subset of the Newick tree standard used by programs in the PHYLIP package. There are no comments and no quotations marks around names. (If a name includes unusual characters, such as a comma, fastDNAml will put it in quotation marks, making it a valid tree, but it cannot be read by the PHYLIP programs.) The Prolog format very similar to the Newick format, but it is a valid prolog fact that permits direct loading into some sequence analysis tools that we use. The structure of the term is: pseudo_newick([Comment], (Subtree1, Subtree2, Subtree3): Length). where each subtree is either (Subtree1,Subtree2): Length or Label: Length The comment is a valid prolog term when && is defined as a unary operator. Label is a prolog atom (it is a valid Newick label, with single quotation marks). Length is a number. Because the Y auxiliary input line is optional, it cannot be the last auxiliary data line. Examples. To turn of the saving of the tree, 5 114 Y Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... or, to change the output to the full Newick format, 5 114 Y T Y 1 T 2.0 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG ... PHYLIP DNAML does not append the PID (process ID) to the tree file name and does not support the full Newick standard or the prolog format output. ============================================================================= Acknowledgements: The origin and development of fastDNAml as a program to extend the use of maximum likelihood phylogenetic inference to larger sets of DNA sequences was encouraged by Carl Woese. Through the development and evolution of the program, Joseph Felsenstein has been extremely helpful and encouraging. Numerous users have made suggestions and/or reported program bugs: Gary Nunn Tom Schmidt Ross Overbeek Hideo Matsuda Mitchell Sogin Brenden Rielly ============================================================================= Examples: Data file with empirical frequencies (generic analysis) (notice that blank lines are permitted in the data): 5 114 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG Data file with empirical frequencies and a random addition order: 5 114 J J 137 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG Data file with empirical frequencies and a bootstrap resampling: 5 114 B B 137 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG Data with weighting mask and rate categories: 5 114 W C Weights 111111111111001100000100011111100000000000000110000110000000 111101111111111111111111011100000111001011100000000011 C 10 0.0625 0.125 0.25 0.5 1 2 4 8 16 32 Categories 5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9 633792246624457364222574877188898132984963499AA9899975 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG Data with three user-specified tree branching orders: 5 114 U Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG 3 (Sequence1,(Sequence2,Sequence3),(Sequence4,Sequence5)); (Sequence2,(Sequence1,Sequence3),(Sequence4,Sequence5)); (Sequence3,(Sequence1,Sequence2),(Sequence4,Sequence5)); Data with transition/transversion ratio and base frequencies to simulate Jukes & Cantor model: 5 114 T F T 0.501 F 0.25 0.25 0.25 0.25 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG Non-interleaved data: 5 114 I Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG Non-interleaved data by editing a GenBank format (make sure that the names are padded to at least ten characters with blanks): 5 114 I Sequence1 1 ACACGGTGTC GTATCATGCT GCAGGATGCT AGACTGCGTC ANATGTTCGT ACTAACTGTG 61 AGCTCGATGA TCGGTGACGT AGACTCAGGG GCCATGCCGC GAGTTTGCGA TGCG Sequence2 1 ACGCGGTGTC GTGTCATGCT ACATTATGCT AGACTGCGTC GGATGCTCGT ATTGACTGCG 61 AGCACGGTGA TCAATGACGT AGNCTCAGGR TCCACGCCGT GACTTTGTGA TNCG Sequence3 1 ACGCGGTGCC GTGTNATGCT GCATTATGCT CGACTGCGRC GGATGCTAGT ATTGACTGCG 61 AGCACGATGA CCGATGACGT AGACTGAGGG TCCGTGCCGC GACTTTGTGA TGCG Sequence4 1 ACGCGCTGCC GTGTCATCCT ACACGATGCY AGACAGCGTC AGCTGCTAGT ACTGGCTGAG 61 ACCTCGGTGA TTGATGACGT AGACTGCGGG TCCATGCCGC GATTTTGCGR TGCG Sequence5 1 ACGCGCTGTC GTGTCATACT GCAGGATGCT AGACTGCGTC AGCTGCTAGT ACTGGCTGAG 61 ACCTCGATGC TCGATGACGT AGACTGCGGG TCCATGCCGT GATTTTGCGA TGCG Data analysis restarted from a four-taxon tree (which happens to be wrong, but it will be corrected by local rearrangements after the tree is read): 5 114 R Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG (Sequence4:0.1,Sequence2:0.1,(Sequence1:0.1,Sequence5:0.1):0.1):0.0; Data analysis restarted from a four-taxon tree (which is wrong, and which will not be corrected after the tree is read due to the suppression of all rearrangements by the global 0 0 option): 5 114 R G T G 0 0 T 2.0 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG (Sequence4:0.1,Sequence2:0.1,(Sequence1:0.1,Sequence5:0.1):0.1):0.0; bel: Length The comment is a valid prolog term when && is defined as a unary operator. Label is a prolog atom (it is a valid Newick label, with single qufastDNAml_1.2.2/docs/fastDNAml_scripts.txt010064400000410000013000000513030703414344600207240ustar00garyarchae00000400000020 Shell Scripts for use with fastDNAml and DNArates SUMMARY UNIX shell scripts have proven quite useful in running the fastDNAml and/or DNArates programs. They have been used in two different contexts. First, many of the program options can be invoked by simple editing of the input. The second category are scripts that help run and maintain results of the program. bootstrap add B (bootstrap) option (and optional seed) to input categories add C (rate categories) option and values to input categories_file add Y (categories file) option to input (DNArates) clean_checkpoints remove checkpoint files when there is a finished treefile clean_jumbles remove all but one optimal jumble for a given result fastDNAml_boot loop over bootstrap seeds, doing 1 or more jumbles each fastDNAml_loop do jumbles, stopping when same best tree found n times frequencies add F option and user-defined frequencies to input global add G (global) option (and optional region size) to input jumble add J (jumble) option (and optional seed) to input min_info add M (minimum information) option and value to input n_categories add C (categories) option (without rate values) to input out.PID append process ID to output file name of a program outgroup add O (outgroup) option and number to input printdata add 1 (print data) option to input quickadd add Q (quickadd) option to input restart add R (restart) option and checkpoint tree to input scores summarize and sort likelihoods from jumble output files transition add T (transition/transversion) option and value to input treefile add Y (treefile) option to input trees2NEXUS combine trees and add a NEXUS wrapper for PAUP and MacClade trees2prolog convert Newick format trees to prolog facts userlengths add L (userlengths) option to input usertree add U (usertree) option, tree count, and tree(s) to input usertrees add U (usertree) option, tree count, and tree(s) to input weights add W (userweight) option and values to the input weights_categories add W and C options and values to the input SCRIPTS THAT INVOKE DNAML OPTIONS GENERAL COMMENTS: The program fastDNAml takes data from standard input. Thus, to run the program with data in the file called "infile", the command would be fastDNAml outfile Because of the use of standard input, the input to fastDNAml can by preprocessed by a function, and then piped to the program. For example, bootstrap outfile or bootstrap 137 outfile can be used to add the bootstrap option and a random number seed to the input, and then pass it on to fastDNAml for analysis. Many of the fastDNAml options are amenable to this arrangement. In each case, the preprocessing can simply add options (and auxiliary data lines, as necessary) to the input. In addition to avoiding the need to play with UNIX text editors, there are several advantages to this approach: 1. The files remain relatively compatible with PHYLIP DNAML. 2. It reduces the chance of introducing errors into the data. 3. It is easier to try alternative options on the same data. 4. If the data for each sequence are provided in one long line (so that interleaved and non-interleaved formats are the same), then some text editors will truncate the lines. Shell scripts are available for each of the above program options. The corresponding formats and effects are described below. THE SCRIPTS: BOOTSTRAP (B) Format: bootstrap [random_seed] Example: bootstrap outfile Example: bootstrap 137 outfile Adds a bootstrap option and a random number seed to the input. If the random seed is not supplied, then the process ID of the bootstrap shell is used. Thus, repeated executions of the first example will tend to generate different random samples (note that many systems only use about 32000 process IDs, so once you get above 100 repetitions, reuse of the same number may become a significant concern). CATEGORIES (C) Format: categories categories_data_file Example: categories archae.rates archaea.out Adds the categories option and the corresponding data to the input. The data must have the format specified for PHYLIP dnaml 3.3. The first line must be the letter C, followed by the number of categories (a number in the range 1 through 35), and then a blank-separated list of the rates for each category. (The list can take more than one line; the program reads until it finds the specified number of rate values.) The next line should be the word Categories followed by one rate category character per sequence position. The categories 1 - 35 are represented by the series 1, 2, 3, ..., 8, 9, A, B, C, ..., Y, Z. These latter data can be on one or more lines. For example, C 12 0.0625 0.125 0.25 0.5 1 2 4 8 16 32 64 128 Categories 5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9 633792246624457364222574877188898132984963499AA9899975 or, with more categories, C 35 0.16529 0.29525 0.34482 0.40272 0.47035 0.54933 0.64157 0.74930 0.87512 1.02207 1.19369 1.39413 1.62823 1.90164 2.22096 2.59389 3.02945 3.53815 4.13227 4.82615 5.63654 6.58301 7.68841 8.97943 10.48723 12.24822 14.30490 16.70694 19.51232 22.78878 26.61541 31.08459 36.30423 42.40033 256.00000 Categories 4HHZ282111 21ED48H1HD Z1CD171411 1118F111EI IHI8ELBZZZ ZZZZZZZZZZ ZZZZZZZZZZ 1MJZZMJLKL ZKL1ZZZZZZ ZZZZZZZZZZ ZZZZZZZZGH HHIGG43FOZ Z2B9111324 1ZZZ171Z11 1184GH11ZZ IB1BBZ111J IB1ILKF4L1 21AEDE8111 111111ED9K 2219L3HGJ1 1Z1ZZMONMH ZZOMSQLM8Z 11411 (Notice that spaces are permitted in the categories data, and that the values can extend across multiple lines. However, this means that extra values are not permitted.) In order to generate output compatible with PHYLIP dnaml v3.3, this should be the first option added (so that the categories data are inserted immediately before the sequence data). CATEGORIES_FILE (Y) Format: categories_file Adds the Y option to the input data for the DNArates program. Makes the program write a file of weights and categories that can be directly added to the input for the fastDNAml program (see weights_categories script). Example: categories_file archaea.out Adds the outgroup option and appropriate auxiliary data line to the input. The example will infer a tree for the archaea data, root it on sequence 5, and write a tree to treefile.PID, where PID is a number (the process ID of fastDNAml). The textual output from fastDNAml (a description of the analysis) is written to archaea.out. PRINTDATA (1) Format: printdata Example: printdata archaea.out Adds a printdata option to the input. In the example, the file archaea.out will include an echoing of the data in addition to the usual output. QUICKADD (Q) Format: quickadd Example: quickadd archaea.out Adds a quickadd option to the input. This greatly decreases the time in initially placing a new sequence in the growing tree (but does not change the time required to subsequently test rearrangements). This will probably become the default program behavior in the near future. Any possible downside of the quickadd option would be a decreased frequency of finding the globally optimal tree. Since you should NEVER depend on a single order of addition yielding the best tree, multiple jumble runs will still be the best way to check the reproducibility of any presumptively optimal tree. Quickadd should let you do this more quickly! RESTART (R) Format: restart checkpoint_file_name Example: quickadd archaea.out Example: transition 2.0 /dev/null; do if test $# -lt 2; then break elif test $1 = "$nosummaryflag"; then summary=0; shift elif test $1 = "-n"; then summary=0; shift elif test $1 = "$summaryflag"; then summary=1; shift elif test $1 = "-s"; then summary=1; shift elif test $1 = "-"; then shift; break else echo "Bad flag: $*"; shift $#; break fi done if test $# -ne 1; then echo "Usage: $comm [ $nosummaryflag ] file_name_root" exit fi out="$1" # Check requested name if test `ls -d $out.[0-9]* 2>/dev/null | wc -l` -eq 0; then echo "$comm: No files found of form $out.(jumble_seed)" exit elif grep '^Ln Likelihood' $out.[0-9]* >/dev/null; then : else echo "$comm: No likelihoods found in files of form $out.(jumble_seed)" exit fi # Summary exists summarized=0 if test -f "$out.summary" -a $summary -gt 0; then if test ! -f "$out.tree" -o ! -f "$out.out"; then echo " $comm: Summary file $out.summary exists, but corresponding output and tree files ($out.out and $out.tree) cannot be found. Cleaning aborted. " exit else echo " Summary file $out.summary exists. New jumbles will be added to that summary without further checking. " summarized=1 fi fi # Don't clobber an existing file if test $summarized -eq 0 -a \( -f "$out.tree" -o -f "$out.out" \); then echo " $comm: File(s) with the name(s) $out.out and/or $out.tree already exist and would be clobbered by 'cleaning' the jumble output files. Move them to a new name and try again. " exit fi # Find best file PID with the given name if test $summarized -eq 0; then pid=`grep '^Ln Likelihood' $out.[0-9]* /dev/null | sed 's/^\(.*\):Ln Like.*=\(.*\)$/\2 \1/' | sort -nr +0 | head -1 | sed -e 's/^[^ ]* //' -e 's/^.*\.//'` fi # Write score summary file, if requested if test $summary -gt 0; then if test $summarized -eq 0; then grep '^Ln Likelihood' $out.[0-9]* /dev/null | sed 's/^\(.*\):\(Ln Like.*\)$/\2 (file: \1)/' | sort -nr +3 > "$out.summary" else temp_name="$out.`uname -n`.$$" mv "$out.summary" "$temp_name" grep '^Ln Likelihood' $out.[0-9]* /dev/null | sed 's/^\(.*\):\(Ln Like.*\)$/\2 (file: \1)/' | cat "$temp_name" - | sort -nr +3 > "$out.summary" rm -f "$temp_name" fi fi # Move output and treefile to new names if test $summarized -eq 0; then oldname="$out.$pid" if grep "^Ln Likelihood" "$oldname" >/dev/null; then newname="$out.out" treenew="$out.tree" treeold="treefile.$pid" checkpt="checkpoint.$pid" if test -f "$treeold"; then mv "$treeold" "$treenew" elif test -f "$checkpt"; then tail -1 "$checkpt" >"$treenew" else echo "$comm: Cannot find tree file. Cleaning aborted."; exit fi mv "$oldname" "$newname" rm -f "$checkpt" fi fi # Remove other output, tree and checkpoint files: if test `ls -d $out.[0-9]* 2>/dev/null | wc -l` -gt 0; then pids=`grep '^Ln Likelihood' $out.[0-9]* /dev/null | sed -e 's/^\(.*\):Ln Like.*$/\1/' -e 's/^.*\.//'` for pid in $pids; do rm -f "$out.$pid" "treefile.$pid" "checkpoint.$pid" done fi fastDNAml_1.2.2/scripts/dnaml_progress010064400000410000013000000002670703414345500203070ustar00garyarchae00000400000020#! /bin/sh # (for file in $*; do tail -1 $file | sed 's:^:'"$file"' :'; done) | \ sed -e 's/^\([^ ]*\) .*\(likelihood =[^,]*,[^,]*\),.*$/\2 (file = \1)/' | \ sort -nr +5 -6 +2 -3 fastDNAml_1.2.2/scripts/fastDNAml_boot010064400000410000013000000136420703414345500201250ustar00garyarchae00000400000020#! /bin/sh # comm=`echo "$0" | sed -e 's&^.*/&&g'` bootflag="-boots" cleanflag="-noclean" maxflag="-max" outflag="-out" niceflag="-nice" seedflag="-seed" stdopt="| jumble" cleanUp=1 jumbles=10 nice=10 remaining=1 saveOut=0 seed="$$`date +%M%S`" # The spaces in the echo and grep are required because of a "feature" that # causes /bin/sh echo to consume ANY leading argument that begins with -n. while echo " $1" | grep "^ -" >/dev/null; do if test $# -lt 2; then break elif test $1 = $maxflag; then jumbles=$2; shift; shift elif test $1 = "-m"; then jumbles=$2; shift; shift elif test $1 = $cleanflag; then cleanUp=0; shift elif test $1 = "-c"; then cleanUp=0; shift elif test $1 = $niceflag; then nice=$2; shift; shift elif test $1 = "-n"; then nice=$2; shift; shift elif test $1 = $outflag; then saveOut=1; shift elif test $1 = "-o"; then saveOut=1; shift elif test $1 = $seedflag; then seed=$2; shift; shift elif test $1 = "-s"; then seed=$2; shift; shift elif test $1 = $bootflag; then remaining=$2; shift; shift elif test $1 = "-b"; then remaining=$2; shift; shift elif test $1 = "-"; then shift; break else echo "Bad flag: $*"; while test $# -gt 0; do shift; done; break fi done if test $# -eq 2; then opts="$stdopt"; elif test $# -eq 3; then if test -n "$3"; then opts="$stdopt | $3" else opts="$stdopt" fi else cleanprm="[$cleanflag]" saveprm="[$saveflag]" cntprm="[$bootflag nboot]" maxprm="[$maxflag maxjumble]" niceprm="[$niceflag nicevalue]" seedprm="[$seedflag seed]" optprm="[ "'"'"dnaml_opt1 [ | dnaml_opt2 [...]]"'"'" ]" echo " Usage: $comm $cntprm $seedprm\\ $maxprm $niceprm $cleanprm $saveprm\\ in_file n_best $optprm For the current bootstrap seed, the sequence input order is jumbled (up to maxjumble times) until the same best tree is found n_best times. The output files are then reduced to a summary of the scores produced by jumbling, and one example of the best tree. The number process is then repeated with new bootstrap seeds until nboot samples have been analyzed. Boot and jumble are included by the script and should not be specified by the user or in the data file. Additional fastDNAml program options are enclosed in quotes, and separated by vertical bars (|). Flags and parameters: in_file -- name of the input data file n_best -- input order is jumbled (up to maxjumble times) until same tree is found n_best times $bootflag nboot -- number of different bootstrap samples (Default=1) $seedflag seed -- seed for first bootstrap (Default is based on the process ID and time of day) $maxflag maxjumble -- maximum attempts at replicating inferred tree (Default=10) $niceflag nicevalue -- run fastDNAml with specified nice value (Default=10) $cleanflag -- inhibits cleanup of the files for the individual jumbles $saveflag -- inhibits cleanup of the text output from fastDNAml " exit fi if test $cleanUp -ne 0; then cleanflag=""; fi if test $saveOut -eq 0; then outflag=""; fi if test -f "$1"; then root=`echo "$1" | sed -e 's/\.phylip$//' -e 's/\.phy$//'`; in="$1" elif test -f "$1.phy"; then root="$1"; in="$1.phy" elif test -f "$1.phylip"; then root="$1"; in="$1.phylip" else echo "$comm: Unable to find input file: $1"; exit fi seed=`echo $seed | awk '{printf("%09d",$1)}'` out=`echo "${root}_$seed" | sed -e 's&^.*/&&'` # Check for reuse of same random seed: if test ! -f "$out.tree" -a ! -f "$out.out"; then # Loop over jumble orders: while if test `ls -d $out.[0-9]* 2>/dev/null | wc -l` -gt 0; then nJumble=`grep '^Ln Likelihood' $out.[0-9]* /dev/null | wc -l` nBest=`grep '^Ln Likelihood' $out.[0-9]* /dev/null | sed -e 's/^.*:Ln Likelihood =\(.*\)$/\1/g' | sort -nr +0 | awk 'BEGIN{c=0} NR==1{b=$1-0.001} $1>=b{c++} END{print c}'` else nBest=0 nJumble=0 fi test $nBest -lt $2 -a $nJumble -lt $jumbles do eval "bootstrap $seed < $in $opts | nice -$nice out.PID fastDNAml $out" >/dev/null || exit done if test $cleanUp -ne 0; then # # clean_jumbles # # Check for files if test `ls -d $out.[0-9]* 2>/dev/null | wc -l` -eq 0; then echo "$comm: No files found for $out" exit fi # Find file suffix with the best score pid=`grep '^Ln Likelihood' $out.[0-9]* /dev/null | sed 's/^\(.*\):Ln Like.*=\(.*\)$/\2 \1/' | sort -nr +0 | head -1 | sed -e 's/^[^ ]* //' -e 's/^.*\.//'` if test -z "$pid"; then echo "$comm: No likelihoods found for $out" exit fi # Move output and treefile to new names treenew="$out.tree" treeold="treefile.$pid" checkpt="checkpoint.$pid" if test -f "$treeold"; then mv "$treeold" "$treenew" elif test -f "$checkpt"; then tail -1 "$checkpt" >"$treenew" else echo "$comm: Cannot find tree file. Bootstrap aborted."; exit fi rm -f "$checkpt" oldname="$out.$pid" if test $saveOut -ne 0; then mv "$oldname" "$out.out" else rm -f "$oldname" fi # Remove other output, tree and checkpoint files: if test `ls -d $out.[0-9]* 2>/dev/null | wc -l` -gt 0; then pids=`grep '^Ln Likelihood' $out.[0-9]* /dev/null | sed -e 's/^\(.*\):Ln Like.*$/\1/' -e 's/^.*\.//'` for pid in $pids; do rm -f "$out.$pid" "treefile.$pid" "checkpoint.$pid" done fi # End of clean_jumbles fi remaining=`expr $remaining - 1` fi # Check number of replicates: if test $remaining -gt 0; then $0 $bootflag $remaining $maxflag $jumbles $cleanflag $outflag $niceflag $nice "$@" & fi input data file n_best -- input order is jumbled (up to maxjumble times) until same tree fastDNAml_1.2.2/scripts/fastDNAml_loop010064400000410000013000000072360703414345600201360ustar00garyarchae00000400000020#! /bin/sh # comm=`echo "$0" | sed -e 's&^.*/&&'` cleanflag="-noclean" maxflag="-max" niceflag="-nice" stdopt="" jumbles=10 nice=10 cleanUp=1 # The spaces in the echo and grep are required because of a "feature" that # causes /bin/sh echo to consume ANY leading argument that begins with -n. while echo " $1" | grep "^ -" >/dev/null; do if test $# -lt 2; then break elif test $1 = $maxflag; then jumbles=$2; shift; shift elif test $1 = "-m"; then jumbles=$2; shift; shift elif test $1 = $cleanflag; then cleanUp=0; shift elif test $1 = "-c"; then cleanUp=0; shift elif test $1 = $niceflag; then nice=$2; shift; shift elif test $1 = "-n"; then nice=$2; shift; shift elif test $1 = "-"; then shift; break else echo "Bad flag: $*"; while test $# -gt 0; do shift; done; break fi done if test $# -eq 2; then opts="$stdopt" elif test $# -eq 3; then if test -n "$3"; then opts="$stdopt | $3" else opts="$stdopt" fi else cleanprm="[$cleanflag]" maxprm="[$maxflag maxjumble]" niceprm="[$niceflag nicevalue]" optprm="[ "'"'"dnaml_opt1 [ | dnaml_opt2 [...]]"'"'" ]" echo " Usage: $comm $maxprm $cleanprm $niceprm \\ in_file n_best $optprm For the given input file, the sequence input order is jumbled (up to maxjumble times) until the same best tree is found n_best times. The output files are then reduced to a summary of the scores produced by jumbling, and one example of the best tree. The jumble option is included by the script and should not be specified by the user or in the data file. Additional fastDNAml program options are enclosed in quotes, and separated by vertical bars (|). Flags and parameters: in_file -- name of the input data file n_best -- input order is jumbled (up to maxjumble times) until same tree is found n_best times $maxflag maxjumble -- maximum attempts at replicating inferred tree (Default=10) $niceflag nicevalue -- run fastDNAml with specified nice value (Default=10) $cleanflag -- inhibits cleanup of the output files " exit fi if test $cleanUp -ne 0; then cleanflag=""; fi if test -f "$1"; then root=`echo "$1" | sed -e 's/\.phylip$//' -e 's/\.phy$//'`; in="$1" elif test -f "$1.phy"; then root="$1"; in="$1.phy" elif test -f "$1.phylip"; then root="$1"; in="$1.phylip" else echo "$comm: Unable to find input file: $1"; exit fi out=`echo "$root" | sed -e 's&^.*/&&'` # Don't clobber an existing file if test $cleanUp -ne 0 -a \( -f "$out.tree" -o -f "$out.out" \); then echo "" echo "$comm: File(s) with the name(s) $out.out and/or $out.tree" echo "already exist and would be clobbered by 'cleaning' the jumble output" echo "files. Move them to a new name and try again." echo "" exit fi # Loop over jumble orders: while if test $cleanUp -ne 0 -a -f "$out.summary"; then echo "" echo "$comm: Jumbling stopped by existence of summary file:" echo "$out.summary" echo "" jumbles=0 nBest=0 nJumble=0 elif test `ls -d $out.[0-9]* 2>/dev/null | wc -l` -gt 0; then nJumble=`grep '^Ln Likelihood' $out.[0-9]* /dev/null | wc -l` nBest=`grep '^Ln Likelihood' $out.[0-9]* /dev/null | sed -e 's/^.*:Ln Likelihood =\(.*\)$/\1/g' | sort -nr +0 | awk 'BEGIN{c=0} NR==1{b=$1-0.001} $1>=b{c++} END{print c}'` else nBest=0 nJumble=0 fi test $nBest -lt $2 -a $nJumble -lt $jumbles do eval "jumble < $in $opts | nice -$nice out.PID fastDNAml $out" >/dev/null || exit done if test $cleanUp -ne 0; then clean_jumbles "$out"; fi fastDNAml_1.2.2/scripts/frequencies010064400000410000013000000003410703414345600175730ustar00garyarchae00000400000020#! /bin/sh # # frequencies shell script # if test $# -ne 4; then echo "Usage: $0 fA fC fG fT"; exit; fi if test $# -gt 0; then echo "Usage: $0"; exit; fi read first_line echo "$first_line F" echo "F $1 $2 $3 $4" cat - fastDNAml_1.2.2/scripts/global010064400000410000013000000004130703414345600165220ustar00garyarchae00000400000020#! /bin/sh # # global shell script # if test $# -gt 2 then echo "Usage: $0 [ full_tree_range [ partial_tree_ range ]]" exit fi read first_line echo "$first_line G" if test $# -eq 1; then echo "G $1" ; fi if test $# -eq 2; then echo "G $1 $2" ; fi cat - fastDNAml_1.2.2/scripts/iterate_rates010064400000410000013000000024650703414345700201270ustar00garyarchae00000400000020#! /bin/sh # # iterate_rates file_name_root program_options [cycles] if test $# -lt 2; then echo "Usage: iterate_rates file_name_root [cycles]"; exit fi root0="$1" if test -f "${root0}.phylip"; then suf=phylip elit test -f "${root0}.phy"; then suf=phy else echo "Could not find sequence file ${root0}.phy[lip]" exit fi root=`echo "$root0" | sed 's&^.*/\([^/][^/]*\)$&\1&'` if test $# -gt 2; then cycles=$3; else cycles=0; fi tree1=`ls -1 ${root}_*.tree | tail -1` if test -z "$tree1"; then transition 2 < ${root}.$suf | treefile | quickadd | global 0 0 | fastDNAml >/dev/null mv `ls -t treefile.*|head -1` "${root}.dummy_tree" usertree "${root}.dummy_tree" < ${root0}.$suf | n_categories 35 | treefile | DNAml_rates > /dev/null mv `ls -t treefile.*|head -1` v0=0; v1=1; else v0=`echo "$tree1" | sed 's/^.*_\([0-9][0-9]*)\.tree$/\1/'` v1=`expr $v0 + 1` fi if test ! -f ${root}_${v0}.$suf; then ln -s ${root}.$suf ${root}_${v0}.$suf; fi usertree $tree1 < ${root}_${v0}.$suf | n_categories 35 | treefile | frequencies | DNAml_rates > ${root}_${v0}.rates mv weight_rate.* ${root}_${v0}.wr if test ! -f ${root}_${v1}.$suf; then ln -s ${root}.$suf ${root}_${v1}.$suf; fi fastDNAml_loop -m 20 ${root}_${v1}.$suf 3 "frequencies | weights_categories ${root}_${v0}.wr" fastDNAml_1.2.2/scripts/jumble010064400000410000013000000003250703414345700165430ustar00garyarchae00000400000020#! /bin/sh # # jumble shell script # if test $# -gt 1; then echo "Usage: $0 [ seed ]"; exit; fi read first_line if test $# -lt 1; then random=$$ ; else random=$1 ; fi echo "$first_line J" echo "J $random" cat - fastDNAml_1.2.2/scripts/min_info010064400000410000013000000002540703414345700170640ustar00garyarchae00000400000020#! /bin/sh # # min_info shell script # if test $# -ne 1; then echo "Usage: $0 min_unambiguous_residues"; exit; fi read first_line echo "$first_line M" echo "M $1" cat - fastDNAml_1.2.2/scripts/n_categories010064400000410000013000000002540703414346000177220ustar00garyarchae00000400000020#! /bin/sh # # n_categories shell script # if test $# -ne 1; then echo "Usage: $0 number_of_categories"; exit; fi read first_line echo "$first_line C" echo "C $1" cat - fastDNAml_1.2.2/scripts/n_files010064400000410000013000000001740703414346000167000ustar00garyarchae00000400000020#! /bin/sh # nFile=0 for fileName in "$@"; do if test -f "$fileName"; then nFile=`expr $nFile + 1`; fi done echo $nFile fastDNAml_1.2.2/scripts/out.PID010064400000410000013000000013670703414346000165100ustar00garyarchae00000400000020#! /bin/sh # # Run a program, appending its process id to its output file name # outflag="-o" if test $# -eq 2; then out="$2" set - "$1" else while echo " $1" | grep "^ -" >/dev/null; do if test $# -lt 2; then break elif test $1 = $outflag; then out=$2; shift; shift elif test $1 = "-out"; then out=$2; shift; shift elif test $1 = "-"; then shift; break else echo "Bad flag: $*" >&2; while test $# -gt 0; do shift; done; break fi done if test $# -lt 1; then comm=`echo $0 | sed -e 's&^.*/&&g'` echo " Usage: $comm program outfile or $comm $outflag outfile program [program_args] " >&2 exit fi fi echo "$$" exec "$@" > "${out}.$$" fastDNAml_1.2.2/scripts/outgroup010064400000410000013000000002420703414346100171420ustar00garyarchae00000400000020#! /bin/sh # # outgroup shell script # if test $# -ne 1; then echo "Usage: $0 outgroup_number"; exit; fi read first_line echo "$first_line O" echo "O $1" cat - fastDNAml_1.2.2/scripts/printdata010064400000410000013000000002060703414346100172440ustar00garyarchae00000400000020#! /bin/sh # # printdata shell script # if test $# -gt 0; then echo "Usage: $0"; exit; fi read first_line echo "$first_line 1" cat - fastDNAml_1.2.2/scripts/quickadd010064400000410000013000000002050703414346100170420ustar00garyarchae00000400000020#! /bin/sh # # quickadd shell script # if test $# -gt 0; then echo "Usage: $0"; exit; fi read first_line echo "$first_line Q" cat - fastDNAml_1.2.2/scripts/restart010064400000410000013000000012600703414346100167430ustar00garyarchae00000400000020#! /bin/sh # # restart shell script # if test $# -ne 1; then echo "Usage: $0 checkpoint_file"; exit; fi file="$1" if test ! -f "$file"; then echo "$0: $file: File not found"; exit; fi lastTwoEnds=`egrep -n '((;)|(\)\.))[ ]*$' "$file" | sed 's/^\([0-9]*\):.*$/\1/' | tail -2` nFound=`echo $lastTwoEnds | wc -w` if test $nFound -eq 0; then echo "$0: Unable to locate end of tree(s) in file" exit fi read first_line echo "$first_line R" cat - # tail -1 "$file" if test $nFound -eq 1; then cat "$file" else penultimateEnd=`echo $lastTwoEnds | sed 's/^\([0-9]*\).*$/\1/'` lastStart=`expr $penultimateEnd + 1` tail +$lastStart "$file" fi fastDNAml_1.2.2/scripts/scores010064400000410000013000000001540703414346200165570ustar00garyarchae00000400000020#! /bin/sh # grep '^Ln Likelihood' $* /dev/null | sed 's/^\([^:]*\):\(.*\)$/\2 (file: \1)/g' | sort -nr +3 fastDNAml_1.2.2/scripts/transition010064400000410000013000000002560703414346200174560ustar00garyarchae00000400000020#! /bin/sh # # transition shell script # if test $# -ne 1; then echo "Usage: $0 transitions/transversions"; exit; fi read first_line echo "$first_line T" echo "T $1" cat - fastDNAml_1.2.2/scripts/treefile010064400000410000013000000004200703414346200170540ustar00garyarchae00000400000020#! /bin/sh # # treefile shell script # if test $# -gt 1; then echo " Usage: $0 [format_option] option 1: Default Newick 8:45 tree 2: prolog fact" exit fi read first_line echo "$first_line Y" if test $# -eq 1; then echo "Y $1"; fi cat - fastDNAml_1.2.2/scripts/treefile2prolog010064400000410000013000000001130703414346300203610ustar00garyarchae00000400000020#! /bin/sh # sed -e '1 s/^/pseudoNewick(/' -e 's/] /], /' -e 's/;/)./' $* fastDNAml_1.2.2/scripts/trees2NEXUS010064400000410000013000000005620703414346300173140ustar00garyarchae00000400000020#! /bin/sh # # Convert one tree per line into PAUP (NEXUS) tree file # Accepts input from named file(s) or standard input # echo "#NEXUS" echo "" echo "Begin Trees;" if test $# -eq 0; then egrep -n "." | sed -e 's/:/ = /' -e 's/^/ utree /' else i=0 for file in $*; do i=`expr $i + 1` sed -e 's/^/ utree '"$i"' = /' $file done fi echo "Endblock;" fastDNAml_1.2.2/scripts/trees2prolog010064400000410000013000000001110703414346300177020ustar00garyarchae00000400000020#! /bin/sh # sed -e 's/^/pseudoNewick(/' -e 's/] /], /' -e 's/;/)./' $* fastDNAml_1.2.2/scripts/userlengths010064400000410000013000000002100703414346400176170ustar00garyarchae00000400000020#! /bin/sh # # userlengths shell script # if test $# -gt 0; then echo "Usage: $0"; exit; fi read first_line echo "$first_line L" cat - fastDNAml_1.2.2/scripts/usertree010064400000410000013000000004160703414346400171220ustar00garyarchae00000400000020#! /bin/sh # # usertree shell script # # Modified July 14, 1992 to accept phylip_tree prolog fact. # if test $# -lt 1 -o $# -gt 2; then echo "Usage: $0 treefile [ L ]"; exit; fi read first_line echo "$first_line U $2" cat - egrep '((;)|(\)\.))[ ]*$' $1 | wc -l cat $1 fastDNAml_1.2.2/scripts/usertrees010064400000410000013000000004160703414346400173050ustar00garyarchae00000400000020#! /bin/sh # # usertree shell script # # Modified July 14, 1992 to accept phylip_tree prolog fact. # if test $# -lt 1 -o $# -gt 2; then echo "Usage: $0 treefile [ L ]"; exit; fi read first_line echo "$first_line U $2" cat - egrep '((;)|(\)\.))[ ]*$' $1 | wc -l cat $1 fastDNAml_1.2.2/scripts/weights010064400000410000013000000002410703414346400167320ustar00garyarchae00000400000020#! /bin/sh # # weights option shell script # if test $# -ne 1; then echo "Usage: $0 weight_data_file"; exit; fi read first_line echo "$first_line W" cat $1 - fastDNAml_1.2.2/scripts/weights_categories010064400000410000013000000002730703414346500211450ustar00garyarchae00000400000020#! /bin/sh # # weights_categories option shell script # if test $# -ne 1; then echo "Usage: $0 weights+categories_data_file"; exit; fi read first_line echo "$first_line W C" cat $1 - fastDNAml_1.2.2/source/004075500000410000013000000000000703414365200151535ustar00garyarchae00000400000020fastDNAml_1.2.2/source/fastDNAml.c010064400000410000013000004346270703415321100171350ustar00garyarchae00000400000020#define programName "fastDNAml" #define programVersion "1.2.2" #define programVersionInt 10202 #define programDate "January 3, 2000" #define programDateInt 20000103 /* fastDNAml, a program for estimation of phylogenetic trees from sequences. * Copyright (C) 1998, 1999, 2000 by Gary J. Olsen * * This program is free software; you may redistribute it and/or modify it * under the terms of the GNU General Public License as published by the Free * Software Foundation; either version 2 of the License, or (at your option) * any later version. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License * for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * * For any other enquiries write to Gary J. Olsen, Department of Microbiology, * University of Illinois, Urbana, IL 61801, USA * * Or send E-mail to gary@phylo.life.uiuc.edu * * * fastDNAml is based in part on the program dnaml by Joseph Felsenstein. * * Copyright notice from dnaml: * * version 3.3. (c) Copyright 1986, 1990 by the University of Washington * and Joseph Felsenstein. Written by Joseph Felsenstein. Permission is * granted to copy and use this program provided no fee is charged for it * and provided that this copyright notice is not removed. * * * When publishing work that based on results from fastDNAml please cite: * * Felsenstein, J. 1981. Evolutionary trees from DNA sequences: * A maximum likelihood approach. J. Mol. Evol. 17: 368-376. * * and * * Olsen, G. J., Matsuda, H., Hagstrom, R., and Overbeek, R. 1994. * fastDNAml: A tool for construction of phylogenetic trees of DNA * sequences using maximum likelihood. Comput. Appl. Biosci. 10: 41-48. */ /* Conversion to C and changes in sequential code by Gary Olsen, 1991-1994 * * p4 version by Hideo Matsuda and Ross Overbeek, 1991-1993 */ /* * 1.0 March 14, 1992 * Initial "release" version * * 1.0.1 March 18, 1992 * Add ntaxa to tree comments * Set minimum branch length on reading tree * Add blanks around operators in treeString (for prolog parsing) * Add program version to treeString comments * * 1.0.2 April 6, 1992 * Improved option line diagnostics * Improved auxiliary line diagnostics * Removed some trailing blanks from output * * 1.0.3 April 6, 1992 * Checkpoint trees that do not need any optimization * Print restart tree likelihood before optimizing * Fix treefile option so that it really toggles * * 1.0.4 July 13, 1992 * Add support for tree fact (instead of true Newick tree) in * processTreeCom, treeReadLen, str_processTreeCom and * str_treeReadLen * Use bit operations in randum * Correct error in bootstrap mask used with weighting mask * * 1.0.5 August 22, 1992 * Fix reading of underscore as first nonblank character in name * Add strchr and strstr functions to source code * Add output treefile name to message "Tree also written ..." * * 1.0.6 November 20, 1992 * Change (! nsites) test in setupTopol to (nsites == 0) for MIPS R4000 * Add vectorizing compiler directives for CRAY * Include updates and corrections to parallel code from H. Matsuda * * 1.0.7 March 25, 1993 * Remove translation of underlines in taxon names * * 1.0.8 April 30, 1993 * Remove version number from fastDNAml.h file name * * 1.0.9 August 12, 1993 * Version was never released. * Redefine treefile formats and default: * 0 None * 1 Newick * 2 Prolog * 3 PHYLIP (Default) * Remove quote marks and comment from PHYLIP treefile format. * * 1.1.0 September 3-5, 1993 * Arrays of size maxpatterns moved from stack to heap (mallocs) in * evaluateslope, makenewz, and cmpBestTrees. * Correct [maxsites] to [maxpatterns] in temporary array definitions * in Vectorize code of newview and evaluate. (These should also * get converted to use malloc() at some point.) * Change randum to use 12 bit segments, not 6. Change its seed * parameter to long *. * Remove the code that took the absolute value of random seeds. * Correct integer divide artifact in setting default transition/ * transversion parameter values. * When transition/transversion ratio is "reset", change to small * value, not the program default. * Report the "reset" transition/transversion ratio in the output. * Move global data into analdef, rawDNA, and crunchedDNA structures. * Change names of routines white and digit to whitechar and digitchar. * Convert y[] to yType, which is normally char, but is int if the * Vectorize flag is set. * Split option line reading out of getoptions routine. * * 1.1.1 September 30, 1994 * Incorporate changes made in 1.0.A (Feb. 11, 1994): * Remove processing of quotation marks within comments. * Break label finding into copy to string and find tip. * Generalize tree reading to read trees when names are and are not * already known. * Remove absolute value from randum seed reading. * Include integer version number and program date. * Remove maxsite, maxpatterns and maxsp limitations. * Incorporate code for retaining multiple trees. * Activate code for Hasegawa & Kishino test of tree differences. * Make quick add the default, with Q turning it off. * Make emperical frequencies the option with F turning it off. * Allow a residue frequency option line anywhere in the options. * Increase string length passed to treeString (should be length * checked, but ...) * Introduce (Sept.30) and fix (Oct. 26) bug in bootstrap sampling. * Fix error when user frequencies are last line and start with F. * * 1.2 September 5, 1997 * Move likelihood components into structure. * Change rawDNA to rawdata. * Change crunchedDNA to cruncheddata. * Recast the likelihoods per site into an array of stuctures, * where each stucture (likelivector) includes the likelihoods * of each residue type at the site, and a magnitude scale * factor (exp). This requires changing the space allocation, * newview, makenewz, evaluate, and sigma. * Change code of newview to rescale likelihoods up by 2**256 when * the largest value falls below 2**-256. This should solve * floating point underflow for all practical sized trees. * No changes are necessary in makenewz or sigma, since only * relative likelihoods are necessary. * * 1.2.1 March 9, 1998 * Convert likelihood adjustment factor (2**256) to a constant. * Fix vectorized calculation of likelihood (error introduced in 1.2) * * 1.2.2 December 23, 1998 * General code clean-up. * Convert to function definitions with parameter type lists * * 1.2.2 January 3, 2000 * Add copyright and license information * Make this the current release version */ #ifdef Master # undef Master # define Master 1 # define Slave 0 # define Sequential 0 #else # ifdef Slave # undef Slave # define Master 0 # define Slave 1 # define Sequential 0 # else # ifdef Sequential # undef Sequential # endif # define Master 0 # define Slave 0 # define Sequential 1 # endif #endif #ifdef CRAY # define Vectorize #endif #ifdef Vectorize # define maxpatterns 10000 /* maximum number of different site patterns */ #endif #include #include #include "fastDNAml.h" /* Requires version 1.2 */ #if Master || Slave # include "p4.h" # include "comm_link.h" #endif /* Global variables */ xarray *usedxtip, *freextip; #if Sequential /* Use standard input */ # undef DNAML_STEP # define DNAML_STEP 0 # define INFILE stdin #endif #if Master # define MAX_SEND_AHEAD 400 char *best_tr_recv = NULL; /* these are used for flow control */ double best_lk_recv; int send_ahead = 0; /* number of outstanding sends */ # ifdef DNAML_STEP # define DNAML_STEP 1 # endif # define INFILE Seqf # define OUTFILE Outf FILE *INFILE, *OUTFILE; comm_block comm_slave; #endif #if Slave # undef DNAML_STEP # define DNAML_STEP 0 # define INFILE Seqf # define OUTFILE Outf FILE *INFILE, *OUTFILE; comm_block comm_master; #endif #if Debug FILE *debug; #endif #if DNAML_STEP int begin_step_time, end_step_time; # define REPORT_ADD_SPECS p4_send(DNAML_ADD_SPECS, DNAML_HOST_ID, NULL, 0) # define REPORT_SEND_TREE p4_send(DNAML_SEND_TREE, DNAML_HOST_ID, NULL, 0) # define REPORT_RECV_TREE p4_send(DNAML_RECV_TREE, DNAML_HOST_ID, NULL, 0) # define REPORT_STEP_TIME \ {\ char send_buf[80]; \ end_step_time = p4_clock(); \ (void) sprintf(send_buf, "%d", end_step_time-begin_step_time); \ p4_send(DNAML_STEP_TIME, DNAML_HOST_ID, send_buf,strlen(send_buf)+1); \ begin_step_time = end_step_time; \ } #else # define REPORT_ADD_SPECS # define REPORT_SEND_TREE # define REPORT_RECV_TREE # define REPORT_STEP_TIME #endif /*=======================================================================*/ /* PROGRAM */ /*=======================================================================*/ /* Best tree handling for dnaml */ /*=======================================================================*/ /* Tip value comparisons * * Use void pointers to hide type from other routines. Only tipValPtr and * cmpTipVal need to be changed to alter the nature of the values compared * (e.g., names instead of node numbers). * * cmpTipVal(tipValPtr(nodeptr p), tipValPtr(nodeptr q)) == -1, 0 or 1. * * This provides direct comparison of tip values (for example, for * definition of tr->start). */ void *tipValPtr (nodeptr p) { return (void *) & p->number; } int cmpTipVal (void *v1, void *v2) { /* cmpTipVal */ int i1, i2; i1 = *((int *) v1); i2 = *((int *) v2); return (i1 < i2) ? -1 : ((i1 == i2) ? 0 : 1); } /* cmpTipVal */ /* These are the only routines that need to UNDERSTAND topologies */ topol *setupTopol (int maxtips, int nsites) { /* setupTopol */ topol *tpl; if (! (tpl = (topol *) Malloc(sizeof(topol))) || ! (tpl->links = (connptr) Malloc((2*maxtips-3) * sizeof(connect))) || (nsites && ! (tpl->log_f = (double *) Malloc(nsites * sizeof(double))))) { printf("ERROR: Unable to get topology memory"); tpl = (topol *) NULL; } else { if (nsites == 0) tpl->log_f = (double *) NULL; tpl->likelihood = unlikely; tpl->start = (node *) NULL; tpl->nextlink = 0; tpl->ntips = 0; tpl->nextnode = 0; tpl->opt_level = 0; /* degree of branch swapping explored */ tpl->scrNum = 0; /* position in sorted list of scores */ tpl->tplNum = 0; /* position in sorted list of trees */ tpl->log_f_valid = 0; /* log_f value sites */ tpl->prelabeled = TRUE; tpl->smoothed = FALSE; /* branch optimization converged? */ } return tpl; } /* setupTopol */ void freeTopol (topol *tpl) { /* freeTopol */ Free(tpl->links); if (tpl->log_f) Free(tpl->log_f); Free(tpl); } /* freeTopol */ int saveSubtree (nodeptr p, topol *tpl) /* Save a subtree in a standard order so that earlier branches * from a node contain lower value tips than do second branches from * the node. This code works with arbitrary furcations in the tree. */ { /* saveSubtree */ connptr r, r0; nodeptr q, s; int t, t0, t1; r0 = tpl->links; r = r0 + (tpl->nextlink)++; r->p = p; r->q = q = p->back; r->z = p->z; r->descend = 0; /* No children (yet) */ if (q->tip) { r->valptr = tipValPtr(q); /* Assign value */ } else { /* Internal node, look at children */ s = q->next; /* First child */ do { t = saveSubtree(s, tpl); /* Generate child's subtree */ t0 = 0; /* Merge child into list */ t1 = r->descend; while (t1 && (cmpTipVal(r0[t1].valptr, r0[t].valptr) < 0)) { t0 = t1; t1 = r0[t1].sibling; } if (t0) r0[t0].sibling = t; else r->descend = t; r0[t].sibling = t1; s = s->next; /* Next child */ } while (s != q); r->valptr = r0[r->descend].valptr; /* Inherit first child's value */ } /* End of internal node processing */ return r - r0; } /* saveSubtree */ nodeptr minSubtreeTip (nodeptr p0) { /* minTreeTip */ nodeptr minTip, p, testTip; if (p0->tip) return p0; p = p0->next; minTip = minSubtreeTip(p->back); while ((p = p->next) != p0) { testTip = minSubtreeTip(p->back); if (cmpTipVal(tipValPtr(testTip), tipValPtr(minTip)) < 0) minTip = testTip; } return minTip; } /* minTreeTip */ nodeptr minTreeTip (nodeptr p) { /* minTreeTip */ nodeptr minp, minpb; minp = minSubtreeTip(p); minpb = minSubtreeTip(p->back); return cmpTipVal(tipValPtr(minp), tipValPtr(minpb)) < 0 ? minp : minpb; } /* minTreeTip */ void saveTree (tree *tr, topol *tpl) /* Save a tree topology in a standard order so that first branches * from a node contain lower value tips than do second branches from * the node. The root tip should have the lowest value of all. */ { /* saveTree */ connptr r; double *tr_log_f, *tpl_log_f; int i; tpl->nextlink = 0; /* Reset link pointer */ r = tpl->links + saveSubtree(minTreeTip(tr->start), tpl); /* Save tree */ r->sibling = 0; tpl->likelihood = tr->likelihood; tpl->start = tr->start; tpl->ntips = tr->ntips; tpl->nextnode = tr->nextnode; tpl->opt_level = tr->opt_level; tpl->prelabeled = tr->prelabeled; tpl->smoothed = tr->smoothed; if (tpl_log_f = tpl->log_f) { tr_log_f = tr->log_f; i = tpl->log_f_valid = tr->log_f_valid; while (--i >= 0) *tpl_log_f++ = *tr_log_f++; } else { tpl->log_f_valid = 0; } } /* saveTree */ void copyTopol (topol *tpl1, topol *tpl2) { /* copyTopol */ connptr r1, r2, r10, r20; double *tpl1_log_f, *tpl2_log_f; int i; r10 = tpl1->links; r20 = tpl2->links; tpl2->nextlink = tpl1->nextlink; r1 = r10; r2 = r20; i = 2 * tpl1->ntips - 3; while (--i >= 0) { r2->z = r1->z; r2->p = r1->p; r2->q = r1->q; r2->valptr = r1->valptr; r2->descend = r1->descend; r2->sibling = r1->sibling; r1++; r2++; } if (tpl1->log_f_valid && tpl2->log_f) { tpl1_log_f = tpl1->log_f; tpl2_log_f = tpl2->log_f; tpl2->log_f_valid = i = tpl1->log_f_valid; while (--i >= 0) *tpl2_log_f++ = *tpl1_log_f++; } else { tpl2->log_f_valid = 0; } tpl2->likelihood = tpl1->likelihood; tpl2->start = tpl1->start; tpl2->ntips = tpl1->ntips; tpl2->nextnode = tpl1->nextnode; tpl2->opt_level = tpl1->opt_level; tpl2->prelabeled = tpl1->prelabeled; tpl2->scrNum = tpl1->scrNum; tpl2->tplNum = tpl1->tplNum; tpl2->smoothed = tpl1->smoothed; } /* copyTopol */ boolean restoreTree (topol *tpl, tree *tr) { /* restoreTree */ void hookup(); boolean initrav(); connptr r; nodeptr p, p0; double *tr_log_f, *tpl_log_f; int i; /* Clear existing connections */ for (i = 1; i <= 2*(tr->mxtips) - 2; i++) { /* Uses p = p->next at tip */ p0 = p = tr->nodep[i]; do { p->back = (nodeptr) NULL; p = p->next; } while (p != p0); } /* Copy connections from topology */ for (r = tpl->links, i = 0; i < tpl->nextlink; r++, i++) { hookup(r->p, r->q, r->z); } tr->likelihood = tpl->likelihood; tr->start = tpl->start; tr->ntips = tpl->ntips; tr->nextnode = tpl->nextnode; tr->opt_level = tpl->opt_level; tr->prelabeled = tpl->prelabeled; tr->smoothed = tpl->smoothed; if (tpl_log_f = tpl->log_f) { tr_log_f = tr->log_f; i = tr->log_f_valid = tpl->log_f_valid; while (--i >= 0) *tr_log_f++ = *tpl_log_f++; } else { tr->log_f_valid = 0; } return (initrav(tr, tr->start) && initrav(tr, tr->start->back)); } /* restoreTree */ int initBestTree (bestlist *bt, int newkeep, int numsp, int sites) { /* initBestTree */ int i, nlogf; bt->nkeep = 0; if (bt->ninit <= 0) { if (! (bt->start = setupTopol(numsp, sites))) return 0; bt->ninit = -1; bt->nvalid = 0; bt->numtrees = 0; bt->best = unlikely; bt->improved = FALSE; bt->byScore = (topol **) Malloc((newkeep+1) * sizeof(topol *)); bt->byTopol = (topol **) Malloc((newkeep+1) * sizeof(topol *)); if (! bt->byScore || ! bt->byTopol) { fprintf(stderr, "initBestTree: Malloc failure\n"); return 0; } } else if (ABS(newkeep) > bt->ninit) { if (newkeep < 0) newkeep = -(bt->ninit); else newkeep = bt->ninit; } if (newkeep < 1) { /* Use negative newkeep to clear list */ newkeep = -newkeep; if (newkeep < 1) newkeep = 1; bt->nvalid = 0; bt->best = unlikely; } if (bt->nvalid >= newkeep) { bt->nvalid = newkeep; bt->worst = bt->byScore[newkeep]->likelihood; } else { bt->worst = unlikely; } for (i = bt->ninit + 1; i <= newkeep; i++) { nlogf = (i <= maxlogf) ? sites : 0; if (! (bt->byScore[i] = setupTopol(numsp, nlogf))) break; bt->byTopol[i] = bt->byScore[i]; bt->ninit = i; } return (bt->nkeep = MIN(newkeep, bt->ninit)); } /* initBestTree */ int resetBestTree (bestlist *bt) { /* resetBestTree */ bt->best = unlikely; bt->worst = unlikely; bt->nvalid = 0; bt->improved = FALSE; } /* resetBestTree */ boolean freeBestTree(bestlist *bt) { /* freeBestTree */ while (bt->ninit >= 0) freeTopol(bt->byScore[(bt->ninit)--]); freeTopol(bt->start); return TRUE; } /* freeBestTree */ /* Compare two trees, assuming that each is in standard order. Return * -1 if first preceeds second, 0 if they are identical, or +1 if first * follows second in standard order. Lower number tips preceed higher * number tips. A tip preceeds a corresponding internal node. Internal * nodes are ranked by their lowest number tip. */ int cmpSubtopol (connptr p10, connptr p1, connptr p20, connptr p2) { /* cmpSubtopol */ connptr p1d, p2d; int cmp; if (! p1->descend && ! p2->descend) /* Two tips */ return cmpTipVal(p1->valptr, p2->valptr); if (! p1->descend) return -1; /* p1 = tip, p2 = node */ if (! p2->descend) return 1; /* p2 = tip, p1 = node */ p1d = p10 + p1->descend; p2d = p20 + p2->descend; while (1) { /* Two nodes */ if (cmp = cmpSubtopol(p10, p1d, p20, p2d)) return cmp; /* Subtrees */ if (! p1d->sibling && ! p2d->sibling) return 0; /* Lists done */ if (! p1d->sibling) return -1; /* One done, other not */ if (! p2d->sibling) return 1; /* One done, other not */ p1d = p10 + p1d->sibling; /* Neither done */ p2d = p20 + p2d->sibling; } } /* cmpSubtopol */ int cmpTopol (void *tpl1, void *tpl2) { /* cmpTopol */ connptr r1, r2; int cmp; r1 = ((topol *) tpl1)->links; r2 = ((topol *) tpl2)->links; cmp = cmpTipVal(tipValPtr(r1->p), tipValPtr(r2->p)); if (cmp) return cmp; return cmpSubtopol(r1, r1, r2, r2); } /* cmpTopol */ int cmpTplScore (void *tpl1, void *tpl2) { /* cmpTplScore */ double l1, l2; l1 = ((topol *) tpl1)->likelihood; l2 = ((topol *) tpl2)->likelihood; return (l1 > l2) ? -1 : ((l1 == l2) ? 0 : 1); } /* cmpTplScore */ /* Find an item in a sorted list of n items. If the item is in the list, * return its index. If it is not in the list, return the negative of the * position into which it should be inserted. */ int findInList (void *item, void *list[], int n, int (* cmpFunc)()) { /* findInList */ int mid, hi, lo, cmp; if (n < 1) return -1; /* No match; first index */ lo = 1; mid = 0; hi = n; while (lo < hi) { mid = (lo + hi) >> 1; cmp = (* cmpFunc)(item, list[mid-1]); if (cmp) { if (cmp < 0) hi = mid; else lo = mid + 1; } else return mid; /* Exact match */ } if (lo != mid) { cmp = (* cmpFunc)(item, list[lo-1]); if (cmp == 0) return lo; } if (cmp > 0) lo++; /* Result of step = 0 test */ return -lo; } /* findInList */ int findTreeInList (bestlist *bt, tree *tr) { /* findTreeInList */ topol *tpl; tpl = bt->byScore[0]; saveTree(tr, tpl); return findInList((void *) tpl, (void **) (& (bt->byTopol[1])), bt->nvalid, cmpTopol); } /* findTreeInList */ int saveBestTree (bestlist *bt, tree *tr) { /* saveBestTree */ double *tr_log_f, *tpl_log_f; topol *tpl, *reuse; int tplNum, scrNum, reuseScrNum, reuseTplNum, i, oldValid, newValid; tplNum = findTreeInList(bt, tr); tpl = bt->byScore[0]; oldValid = newValid = bt->nvalid; if (tplNum > 0) { /* Topology is in list */ reuse = bt->byTopol[tplNum]; /* Matching topol */ reuseScrNum = reuse->scrNum; reuseTplNum = reuse->tplNum; } /* Good enough to keep? */ else if (tr->likelihood < bt->worst) return 0; else { /* Topology is not in list */ tplNum = -tplNum; /* Add to list (not replace) */ if (newValid < bt->nkeep) bt->nvalid = ++newValid; reuseScrNum = newValid; /* Take worst tree */ reuse = bt->byScore[reuseScrNum]; reuseTplNum = (newValid > oldValid) ? newValid : reuse->tplNum; if (tr->likelihood > bt->start->likelihood) bt->improved = TRUE; } scrNum = findInList((void *) tpl, (void **) (& (bt->byScore[1])), oldValid, cmpTplScore); scrNum = ABS(scrNum); if (scrNum < reuseScrNum) for (i = reuseScrNum; i > scrNum; i--) (bt->byScore[i] = bt->byScore[i-1])->scrNum = i; else if (scrNum > reuseScrNum) { scrNum--; for (i = reuseScrNum; i < scrNum; i++) (bt->byScore[i] = bt->byScore[i+1])->scrNum = i; } if (tplNum < reuseTplNum) for (i = reuseTplNum; i > tplNum; i--) (bt->byTopol[i] = bt->byTopol[i-1])->tplNum = i; else if (tplNum > reuseTplNum) { tplNum--; for (i = reuseTplNum; i < tplNum; i++) (bt->byTopol[i] = bt->byTopol[i+1])->tplNum = i; } if (tpl_log_f = tpl->log_f) { tr_log_f = tr->log_f; i = tpl->log_f_valid = tr->log_f_valid; while (--i >= 0) *tpl_log_f++ = *tr_log_f++; } else { tpl->log_f_valid = 0; } tpl->scrNum = scrNum; tpl->tplNum = tplNum; bt->byTopol[tplNum] = bt->byScore[scrNum] = tpl; bt->byScore[0] = reuse; if (scrNum == 1) bt->best = tr->likelihood; if (newValid == bt->nkeep) bt->worst = bt->byScore[newValid]->likelihood; return scrNum; } /* saveBestTree */ int startOpt (bestlist *bt, tree *tr) { /* startOpt */ int scrNum; scrNum = saveBestTree(bt, tr); copyTopol(bt->byScore[scrNum], bt->start); bt->improved = FALSE; return scrNum; } /* startOpt */ int setOptLevel (bestlist *bt, int opt_level) { /* setOptLevel */ int tplNum, scrNum; tplNum = findInList((void *) bt->start, (void **) (&(bt->byTopol[1])), bt->nvalid, cmpTopol); if (tplNum > 0) { bt->byTopol[tplNum]->opt_level = opt_level; scrNum = bt->byTopol[tplNum]->scrNum; } else { scrNum = 0; } return scrNum; } /* setOptLevel */ int recallBestTree (bestlist *bt, int rank, tree *tr) { /* recallBestTree */ if (rank < 1) rank = 1; if (rank > bt->nvalid) rank = bt->nvalid; if (rank > 0) if (! restoreTree(bt->byScore[rank], tr)) return FALSE; return rank; } /* recallBestTree */ /*=======================================================================*/ /* End of best tree routines */ /*=======================================================================*/ #if 0 void hang(char *msg) { printf("Hanging around: %s\n", msg); while(1); } #endif boolean getnums (rawdata *rdta) /* input number of species, number of sites */ { /* getnums */ printf("\n%s, version %s, %s,\nCopyright (C) 1998, 1999, 2000 by Gary J. Olsen\n\n", programName, programVersion, programDate); printf("Based in part on Joseph Felsenstein's\n\n"); printf(" Nucleic acid sequence Maximum Likelihood method, version 3.3\n\n\n"); if (fscanf(INFILE, "%d %d", & rdta->numsp, & rdta->sites) != 2) { printf("ERROR: Problem reading number of species and sites\n"); return FALSE; } printf("%d Species, %d Sites\n\n", rdta->numsp, rdta->sites); if (rdta->numsp < 4) { printf("TOO FEW SPECIES\n"); return FALSE; } if (rdta->sites < 1) { printf("TOO FEW SITES\n"); return FALSE; } return TRUE; } /* getnums */ boolean digitchar (int ch) {return (ch >= '0' && ch <= '9'); } boolean whitechar (int ch) { return (ch == ' ' || ch == '\n' || ch == '\t'); } void uppercase (int *chptr) /* convert character to upper case -- either ASCII or EBCDIC */ { /* uppercase */ int ch; ch = *chptr; if ((ch >= 'a' && ch <= 'i') || (ch >= 'j' && ch <= 'r') || (ch >= 's' && ch <= 'z')) *chptr = ch + 'A' - 'a'; } /* uppercase */ int base36 (int ch) { /* base36 */ if (ch >= '0' && ch <= '9') return (ch - '0'); else if (ch >= 'A' && ch <= 'I') return (ch - 'A' + 10); else if (ch >= 'J' && ch <= 'R') return (ch - 'J' + 19); else if (ch >= 'S' && ch <= 'Z') return (ch - 'S' + 28); else if (ch >= 'a' && ch <= 'i') return (ch - 'a' + 10); else if (ch >= 'j' && ch <= 'r') return (ch - 'j' + 19); else if (ch >= 's' && ch <= 'z') return (ch - 's' + 28); else return -1; } /* base36 */ int itobase36 (int i) { /* itobase36 */ if (i < 0) return '?'; else if (i < 10) return (i + '0'); else if (i < 19) return (i - 10 + 'A'); else if (i < 28) return (i - 19 + 'J'); else if (i < 36) return (i - 28 + 'S'); else return '?'; } /* itobase36 */ int findch (int c) { /* findch */ int ch; while ((ch = getc(INFILE)) != EOF && ch != c) ; return ch; } /* findch */ #if Master || Slave int str_findch (char **treestrp, int c) { /* str_findch */ int ch; while ((ch = *(*treestrp)++) != NULL && ch != c) ; return ch; } /* str_findch */ #endif boolean inputboot(analdef *adef) /* read the bootstrap auxilliary info */ { /* inputboot */ if (! adef->boot) { printf("ERROR: Unexpected Bootstrap auxiliary data line\n"); return FALSE; } else if (fscanf(INFILE, "%ld", & adef->boot) != 1 || findch('\n') == EOF) { printf("ERROR: Problem reading boostrap random seed value\n"); return FALSE; } return TRUE; } /* inputboot */ boolean inputcategories (rawdata *rdta) /* read the category rates and the categories for each site */ { /* inputcategories */ int i, j, ch, ci; if (rdta->categs >= 0) { printf("ERROR: Unexpected Categories auxiliary data line\n"); return FALSE; } if (fscanf(INFILE, "%d", & rdta->categs) != 1) { printf("ERROR: Problem reading number of rate categories\n"); return FALSE; } if (rdta->categs < 1 || rdta->categs > maxcategories) { printf("ERROR: Bad number of categories: %d\n", rdta->categs); printf("Must be in range 1 - %d\n", maxcategories); return FALSE; } for (j = 1; j <= rdta->categs && fscanf(INFILE, "%lf", &(rdta->catrat[j])) == 1; j++) ; if ((j <= rdta->categs) || (findch('\n') == EOF)) { printf("ERROR: Problem reading rate values\n"); return FALSE; } for (i = 1; i <= nmlngth; i++) (void) getc(INFILE); i = 1; while (i <= rdta->sites) { ch = getc(INFILE); ci = base36(ch); if (ci >= 0 && ci <= rdta->categs) rdta->sitecat[i++] = ci; else if (! whitechar(ch)) { printf("ERROR: Bad category character (%c) at site %d\n", ch, i); return FALSE; } } if (findch('\n') == EOF) { /* skip to end of line */ printf("ERROR: Missing newline at end of category data\n"); return FALSE; } return TRUE; } /* inputcategories */ boolean inputextra (analdef *adef) { /* inputextra */ if (fscanf(INFILE,"%d", & adef->extra) != 1 || findch('\n') == EOF) { printf("ERROR: Problem reading extra info value\n"); return FALSE; } return TRUE; } /* inputextra */ boolean inputfreqs (rawdata *rdta) { /* inputfreqs */ if (fscanf(INFILE, "%lf%lf%lf%lf", & rdta->freqa, & rdta->freqc, & rdta->freqg, & rdta->freqt) != 4 || findch('\n') == EOF) { printf("ERROR: Problem reading user base frequencies data\n"); return FALSE; } rdta->freqread = TRUE; return TRUE; } /* inputfreqs */ boolean inputglobal (tree *tr) /* input the global option information */ { /* inputglobal */ int ch; if (tr->global != -2) { printf("ERROR: Unexpected Global auxiliary data line\n"); return FALSE; } if (fscanf(INFILE, "%d", &(tr->global)) != 1) { printf("ERROR: Problem reading rearrangement region size\n"); return FALSE; } if (tr->global < 0) { printf("WARNING: Global region size too small;\n"); printf(" value reset to local\n\n"); tr->global = 1; } else if (tr->global == 0) tr->partswap = 0; else if (tr->global > tr->mxtips - 3) { tr->global = tr->mxtips - 3; } while ((ch = getc(INFILE)) != '\n') { /* Scan for second value */ if (! whitechar(ch)) { if (ch != EOF) (void) ungetc(ch, INFILE); if (ch == EOF || fscanf(INFILE, "%d", &(tr->partswap)) != 1 || findch('\n') == EOF) { printf("ERROR: Problem reading insert swap region size\n"); return FALSE; } else if (tr->partswap < 0) tr->partswap = 1; else if (tr->partswap > tr->mxtips - 3) { tr->partswap = tr->mxtips - 3; } if (tr->partswap > tr->global) tr->global = tr->partswap; break; /* Break while loop */ } } return TRUE; } /* inputglobal */ boolean inputjumble (analdef *adef) { /* inputjumble */ if (! adef->jumble) { printf("ERROR: Unexpected Jumble auxiliary data line\n"); return FALSE; } else if (fscanf(INFILE, "%ld", & adef->jumble) != 1 || findch('\n') == EOF) { printf("ERROR: Problem reading jumble random seed value\n"); return FALSE; } else if (adef->jumble == 0) { printf("WARNING: Jumble random number seed is zero\n\n"); } return TRUE; } /* inputjumble */ boolean inputkeep (analdef *adef) { /* inputkeep */ if (fscanf(INFILE, "%d", & adef->nkeep) != 1 || findch('\n') == EOF || adef->nkeep < 1) { printf("ERROR: Problem reading number of kept trees\n"); return FALSE; } return TRUE; } /* inputkeep */ boolean inputoutgroup (analdef *adef, tree *tr) { /* inputoutgroup */ if (! adef->root || tr->outgr > 0) { printf("ERROR: Unexpected Outgroup auxiliary data line\n"); return FALSE; } else if (fscanf(INFILE, "%d", &(tr->outgr)) != 1 || findch('\n') == EOF) { printf("ERROR: Problem reading outgroup number\n"); return FALSE; } else if ((tr->outgr < 1) || (tr->outgr > tr->mxtips)) { printf("ERROR: Bad outgroup: '%d'\n", tr->outgr); return FALSE; } return TRUE; } /* inputoutgroup */ boolean inputratio (rawdata *rdta) { /* inputratio */ if (rdta->ttratio >= 0.0) { printf("ERROR: Unexpected Transition/transversion auxiliary data\n"); return FALSE; } else if (fscanf(INFILE,"%lf", & rdta->ttratio)!=1 || findch('\n') == EOF) { printf("ERROR: Problem reading transition/transversion ratio\n"); return FALSE; } return TRUE; } /* inputratio */ /* Y 0 is treeNone (no tree) Y 1 is treeNewick Y 2 is treeProlog Y 3 is treePHYLIP */ boolean inputtreeopt (analdef *adef) { /* inputtreeopt */ if (! adef->trout) { printf("ERROR: Unexpected Treefile auxiliary data\n"); return FALSE; } else if (fscanf(INFILE,"%d", & adef->trout) != 1 || findch('\n') == EOF) { printf("ERROR: Problem reading output tree-type number\n"); return FALSE; } else if ((adef->trout < 0) || (adef->trout > treeMaxType)) { printf("ERROR: Bad output tree-type number: '%d'\n", adef->trout); return FALSE; } return TRUE; } /* inputtreeopt */ boolean inputweights (analdef *adef, rawdata *rdta, cruncheddata *cdta) /* input the character weights 0, 1, 2 ... 9, A, B, ... Y, Z */ { /* inputweights */ int i, ch, wi; if (! adef->userwgt || cdta->wgtsum > 0) { printf("ERROR: Unexpected Weights auxiliary data\n"); return FALSE; } for (i = 2; i <= nmlngth; i++) (void) getc(INFILE); cdta->wgtsum = 0; i = 1; while (i <= rdta->sites) { ch = getc(INFILE); wi = base36(ch); if (wi >= 0) cdta->wgtsum += rdta->wgt[i++] = wi; else if (! whitechar(ch)) { printf("ERROR: Bad weight character: '%c'", ch); printf(" Weights in dnaml must be a digit or a letter.\n"); return FALSE; } } if (findch('\n') == EOF) { /* skip to end of line */ printf("ERROR: Missing newline at end of weight data\n"); return FALSE; } return TRUE; } /* inputweights */ boolean getoptions (analdef *adef, rawdata *rdta, cruncheddata *cdta, tree *tr) { /* getoptions */ int ch, i, extranum; adef->boot = 0; /* Don't bootstrap column weights */ adef->empf = TRUE; /* Use empirical base frequencies */ adef->extra = 0; /* No extra runtime info unless requested */ adef->interleaved = TRUE; /* By default, data format is interleaved */ adef->jumble = FALSE; /* Use random addition sequence */ adef->nkeep = 0; /* Keep only the one best tree */ adef->prdata = FALSE; /* Don't echo data to output stream */ adef->qadd = TRUE; /* Smooth branches globally in add */ adef->restart = FALSE; /* Restart from user tree */ adef->root = FALSE; /* User-defined outgroup rooting */ adef->trout = treeDefType; /* Output tree file */ adef->trprint = TRUE; /* Print tree to output stream */ rdta->categs = 0; /* No rate categories */ rdta->catrat[1] = 1.0; /* Rate values */ rdta->freqread = FALSE; /* User-defined frequencies not read yet */ rdta->ttratio = 2.0; /* Transition/transversion rate ratio */ tr->global = -1; /* Default search locale for optimum */ tr->mxtips = rdta->numsp; tr->outgr = 1; /* Outgroup number */ tr->partswap = 1; /* Default to swap locally after insert */ tr->userlen = FALSE; /* User-supplied branch lengths */ adef->usertree = FALSE; /* User-defined tree topologies */ adef->userwgt = FALSE; /* User-defined position weights */ extranum = 0; while ((ch = getc(INFILE)) != '\n' && ch != EOF) { uppercase(& ch); switch (ch) { case '1' : adef->prdata = ! adef->prdata; break; case '3' : adef->trprint = ! adef->trprint; break; case '4' : adef->trout = treeDefType - adef->trout; break; case 'B' : adef->boot = 1; extranum++; break; case 'C' : rdta->categs = -1; extranum++; break; case 'E' : adef->extra = -1; break; case 'F' : adef->empf = ! adef->empf; break; case 'G' : tr->global = -2; break; case 'I' : adef->interleaved = ! adef->interleaved; break; case 'J' : adef->jumble = 1; extranum++; break; case 'K' : extranum++; break; case 'L' : tr->userlen = TRUE; break; case 'O' : adef->root = TRUE; tr->outgr = 0; extranum++; break; case 'Q' : adef->qadd = FALSE; break; case 'R' : adef->restart = TRUE; break; case 'T' : rdta->ttratio = -1.0; extranum++; break; case 'U' : adef->usertree = TRUE; break; case 'W' : adef->userwgt = TRUE; cdta->wgtsum = 0; extranum++; break; case 'Y' : adef->trout = treeDefType - adef->trout; break; case ' ' : break; case '\t': break; default : printf("ERROR: Bad option character: '%c'\n", ch); return FALSE; } } if (ch == EOF) { printf("ERROR: End-of-file in options list\n"); return FALSE; } if (adef->usertree && adef->restart) { printf("ERROR: The restart and user-tree options conflict:\n"); printf(" Restart adds rest of taxa to a starting tree;\n"); printf(" User-tree does not add any taxa.\n\n"); return FALSE; } if (adef->usertree && adef->jumble) { printf("WARNING: The jumble and user-tree options conflict:\n"); printf(" Jumble adds taxa to a tree in random order;\n"); printf(" User-tree does not use taxa addition.\n"); printf(" Jumble option cancelled for this run.\n\n"); adef->jumble = FALSE; } if (tr->userlen && tr->global != -1) { printf("ERROR: The global and user-lengths options conflict:\n"); printf(" Global optimizes a starting tree;\n"); printf(" User-lengths constrain the starting tree.\n\n"); return FALSE; } if (tr->userlen && ! adef->usertree) { printf("WARNING: User lengths required user tree option.\n"); printf(" User-tree option set for this run.\n\n"); adef->usertree = TRUE; } rdta->wgt = (int *) Malloc((rdta->sites + 1) * sizeof(int)); rdta->wgt2 = (int *) Malloc((rdta->sites + 1) * sizeof(int)); rdta->sitecat = (int *) Malloc((rdta->sites + 1) * sizeof(int)); cdta->alias = (int *) Malloc((rdta->sites + 1) * sizeof(int)); cdta->aliaswgt = (int *) Malloc((rdta->sites + 1) * sizeof(int)); cdta->patcat = (int *) Malloc((rdta->sites + 1) * sizeof(int)); cdta->patrat = (double *) Malloc((rdta->sites + 1) * sizeof(double)); cdta->wr = (double *) Malloc((rdta->sites + 1) * sizeof(double)); cdta->wr2 = (double *) Malloc((rdta->sites + 1) * sizeof(double)); if ( ! rdta->wgt || ! rdta->wgt2 || ! rdta->sitecat || ! cdta->alias || ! cdta->aliaswgt || ! cdta->patcat || ! cdta->patrat || ! cdta->wr || ! cdta->wr2) { fprintf(stderr, "getoptions: Malloc failure\n"); return 0; } /* process lines with auxiliary data */ while (extranum--) { ch = getc(INFILE); uppercase(& ch); switch (ch) { case 'B': if (! inputboot(adef)) return FALSE; break; case 'C': if (! inputcategories(rdta)) return FALSE; break; case 'E': if (! inputextra(adef)) return FALSE; extranum++; break; case 'F': if (! inputfreqs(rdta)) return FALSE; break; case 'G': if (! inputglobal(tr)) return FALSE; extranum++; break; case 'J': if (! inputjumble(adef)) return FALSE; break; case 'K': if (! inputkeep(adef)) return FALSE; break; case 'O': if (! inputoutgroup(adef, tr)) return FALSE; break; case 'T': if (! inputratio(rdta)) return FALSE; break; case 'W': if (! inputweights(adef, rdta, cdta)) return FALSE; break; case 'Y': if (! inputtreeopt(adef)) return FALSE; extranum++; break; default: printf("ERROR: Auxiliary options line starts with '%c'\n", ch); return FALSE; } } if (! adef->userwgt) { for (i = 1; i <= rdta->sites; i++) rdta->wgt[i] = 1; cdta->wgtsum = rdta->sites; } if (adef->userwgt && cdta->wgtsum < 1) { printf("ERROR: Missing or bad user-supplied weight data.\n"); return FALSE; } if (adef->boot) { printf("Bootstrap random number seed = %ld\n\n", adef->boot); } if (adef->jumble) { printf("Jumble random number seed = %ld\n\n", adef->jumble); } if (adef->qadd) { printf("Quick add (only local branches initially optimized) in effect\n\n"); } if (rdta->categs > 0) { printf("Site category Rate of change\n\n"); for (i = 1; i <= rdta->categs; i++) printf(" %c%13.3f\n", itobase36(i), rdta->catrat[i]); putchar('\n'); for (i = 1; i <= rdta->sites; i++) { if ((rdta->wgt[i] > 0) && (rdta->sitecat[i] < 1)) { printf("ERROR: Bad category (%c) at site %d\n", itobase36(rdta->sitecat[i]), i); return FALSE; } } } else if (rdta->categs < 0) { printf("ERROR: Category auxiliary data missing from input\n"); return FALSE; } else { /* rdta->categs == 0 */ for (i = 1; i <= rdta->sites; i++) rdta->sitecat[i] = 1; rdta->categs = 1; } if (tr->outgr < 1) { printf("ERROR: Outgroup auxiliary data missing from input\n"); return FALSE; } if (rdta->ttratio < 0.0) { printf("ERROR: Transition/transversion auxiliary data missing from input\n"); return FALSE; } if (tr->global < 0) { if (tr->global == -2) tr->global = tr->mxtips - 3; /* Default global */ else tr->global = adef->usertree ? 0 : 1;/* No global */ } if (adef->restart) { printf("Restart option in effect. "); printf("Sequence addition will start from appended tree.\n\n"); } if (adef->usertree && ! tr->global) { printf("User-supplied tree topology%swill be used.\n\n", tr->userlen ? " and branch lengths " : " "); } else { if (! adef->usertree) { printf("Rearrangements of partial trees may cross %d %s.\n", tr->partswap, tr->partswap == 1 ? "branch" : "branches"); } printf("Rearrangements of full tree may cross %d %s.\n\n", tr->global, tr->global == 1 ? "branch" : "branches"); } if (! adef->usertree && adef->nkeep == 0) adef->nkeep = 1; return TRUE; } /* getoptions */ boolean getbasefreqs (rawdata *rdta) { /* getbasefreqs */ int ch; if (rdta->freqread) return TRUE; ch = getc(INFILE); if (! ((ch == 'F') || (ch == 'f'))) (void) ungetc(ch, INFILE); if (fscanf(INFILE, "%lf%lf%lf%lf", & rdta->freqa, & rdta->freqc, & rdta->freqg, & rdta->freqt) != 4 || findch('\n') == EOF) { printf("ERROR: Problem reading user base frequencies\n"); return FALSE; } return TRUE; } /* getbasefreqs */ boolean getyspace (rawdata *rdta) { /* getyspace */ long size; int i; yType *y0; if (! (rdta->y = (yType **) Malloc((rdta->numsp + 1) * sizeof(yType *)))) { printf("ERROR: Unable to obtain space for data array pointers\n"); return FALSE; } size = 4 * (rdta->sites / 4 + 1); if (! (y0 = (yType *) Malloc((rdta->numsp + 1) * size * sizeof(yType)))) { printf("ERROR: Unable to obtain space for data array\n"); return FALSE; } for (i = 0; i <= rdta->numsp; i++) { rdta->y[i] = y0; y0 += size; } return TRUE; } /* getyspace */ boolean setupTree (tree *tr, int nsites) { /* setupTree */ nodeptr p0, p, q; int i, j, tips, inter; tips = tr->mxtips; inter = tr->mxtips - 1; if (!(p0 = (nodeptr) Malloc((tips + 3*inter) * sizeof(node)))) { printf("ERROR: Unable to obtain sufficient tree memory\n"); return FALSE; } if (!(tr->nodep = (nodeptr *) Malloc((2*tr->mxtips) * sizeof(nodeptr)))) { printf("ERROR: Unable to obtain sufficient tree memory, too\n"); return FALSE; } tr->nodep[0] = (node *) NULL; /* Use as 1-based array */ for (i = 1; i <= tips; i++) { /* Set-up tips */ p = p0++; p->x = (xarray *) NULL; p->tip = (yType *) NULL; p->number = i; p->next = p; p->back = (node *) NULL; tr->nodep[i] = p; } for (i = tips + 1; i <= tips + inter; i++) { /* Internal nodes */ q = (node *) NULL; for (j = 1; j <= 3; j++) { p = p0++; p->x = (xarray *) NULL; p->tip = (yType *) NULL; p->number = i; p->next = q; p->back = (node *) NULL; q = p; } p->next->next->next = p; tr->nodep[i] = p; } tr->likelihood = unlikely; tr->start = (node *) NULL; tr->outgrnode = tr->nodep[tr->outgr]; tr->ntips = 0; tr->nextnode = 0; tr->opt_level = 0; tr->prelabeled = TRUE; tr->smoothed = FALSE; tr->log_f_valid = 0; tr->log_f = (double *) Malloc(nsites * sizeof(double)); if (! tr->log_f) { printf("ERROR: Unable to obtain sufficient tree memory, trey\n"); return FALSE; } return TRUE; } /* setupTree */ void freeTreeNode (nodeptr p) /* Free tree node (sector) associated data */ { /* freeTreeNode */ if (p) { if (p->x) { if (p->x->lv) Free(p->x->lv); Free(p->x); } } } /* freeTreeNode */ void freeTree (tree *tr) { /* freeTree */ nodeptr p, q; int i, tips, inter; tips = tr->mxtips; inter = tr->mxtips - 1; for (i = 1; i <= tips; i++) freeTreeNode(tr->nodep[i]); for (i = tips + 1; i <= tips + inter; i++) { if (p = tr->nodep[i]) { if (q = p->next) { freeTreeNode(q->next); freeTreeNode(q); } freeTreeNode(p); } } Free(tr->nodep[1]); /* Free the actual nodes */ } /* freeTree */ boolean getdata (analdef *adef, rawdata *rdta, tree *tr) /* read sequences */ { /* getdata */ int i, j, k, l, basesread, basesnew, ch; int meaning[256]; /* meaning of input characters */ char *nameptr; boolean allread, firstpass; for (i = 0; i <= 255; i++) meaning[i] = 0; meaning['A'] = 1; meaning['B'] = 14; meaning['C'] = 2; meaning['D'] = 13; meaning['G'] = 4; meaning['H'] = 11; meaning['K'] = 12; meaning['M'] = 3; meaning['N'] = 15; meaning['O'] = 15; meaning['R'] = 5; meaning['S'] = 6; meaning['T'] = 8; meaning['U'] = 8; meaning['V'] = 7; meaning['W'] = 9; meaning['X'] = 15; meaning['Y'] = 10; meaning['?'] = 15; meaning['-'] = 15; basesread = basesnew = 0; allread = FALSE; firstpass = TRUE; ch = ' '; while (! allread) { for (i = 1; i <= tr->mxtips; i++) { /* Read data line */ if (firstpass) { /* Read species names */ j = 1; while (whitechar(ch = getc(INFILE))) { /* Skip blank lines */ if (ch == '\n') j = 1; else j++; } if (j > nmlngth) { printf("ERROR: Blank name for species %d; ", i); printf("check number of species,\n"); printf(" number of sites, and interleave option.\n"); return FALSE; } nameptr = tr->nodep[i]->name; for (k = 1; k < j; k++) *nameptr++ = ' '; while (ch != '\n' && ch != EOF) { if (whitechar(ch)) ch = ' '; *nameptr++ = ch; if (++j > nmlngth) break; ch = getc(INFILE); } while (*(--nameptr) == ' ') ; /* remove trailing blanks */ *(++nameptr) = '\0'; /* add null termination */ if (ch == EOF) { printf("ERROR: End-of-file in name of species %d\n", i); return FALSE; } } /* if (firstpass) */ j = basesread; while ((j < rdta->sites) && ((ch = getc(INFILE)) != EOF) && ((! adef->interleaved) || (ch != '\n'))) { uppercase(& ch); if (meaning[ch] || ch == '.') { j++; if (ch == '.') { if (i != 1) ch = rdta->y[1][j]; else { printf("ERROR: Dot (.) found at site %d of sequence 1\n", j); return FALSE; } } rdta->y[i][j] = ch; } else if (whitechar(ch) || digitchar(ch)) ; else { printf("ERROR: Bad base (%c) at site %d of sequence %d\n", ch, j, i); return FALSE; } } if (ch == EOF) { printf("ERROR: End-of-file at site %d of sequence %d\n", j, i); return FALSE; } if (! firstpass && (j == basesread)) i--; /* no data on line */ else if (i == 1) basesnew = j; else if (j != basesnew) { printf("ERROR: Sequences out of alignment\n"); printf("%d (instead of %d) residues read in sequence %d\n", j - basesread, basesnew - basesread, i); return FALSE; } while (ch != '\n' && ch != EOF) ch = getc(INFILE); /* flush line */ } /* next sequence */ firstpass = FALSE; basesread = basesnew; allread = (basesread >= rdta->sites); } /* Print listing of sequence alignment */ if (adef->prdata) { j = nmlngth - 5 + ((rdta->sites + ((rdta->sites - 1)/10))/2); if (j < nmlngth - 1) j = nmlngth - 1; if (j > 37) j = 37; printf("Name"); for (i=1;i<=j;i++) putchar(' '); printf("Sequences\n"); printf("----"); for (i=1;i<=j;i++) putchar(' '); printf("---------\n"); putchar('\n'); for (i = 1; i <= rdta->sites; i += 60) { l = i + 59; if (l > rdta->sites) l = rdta->sites; if (adef->userwgt) { printf("Weights "); for (j = 11; j <= nmlngth+3; j++) putchar(' '); for (k = i; k <= l; k++) { putchar(itobase36(rdta->wgt[k])); if (((k % 10) == 0) && (k < l)) putchar(' '); } putchar('\n'); } if (rdta->categs > 1) { printf("Categories"); for (j = 11; j <= nmlngth+3; j++) putchar(' '); for (k = i; k <= l; k++) { putchar(itobase36(rdta->sitecat[k])); if (((k % 10) == 0) && (k < l)) putchar(' '); } putchar('\n'); } for (j = 1; j <= tr->mxtips; j++) { nameptr = tr->nodep[j]->name; k = nmlngth+3; while (ch = *nameptr++) {putchar(ch); k--;} while (--k >= 0) putchar(' '); for (k = i; k <= l; k++) { ch = rdta->y[j][k]; if ((j > 1) && (ch == rdta->y[1][k])) ch = '.'; putchar(ch); if (((k % 10) == 0) && (k < l)) putchar(' '); } putchar('\n'); } putchar('\n'); } } for (j = 1; j <= tr->mxtips; j++) /* Convert characters to meanings */ for (i = 1; i <= rdta->sites; i++) { rdta->y[j][i] = meaning[rdta->y[j][i]]; } return TRUE; } /* getdata */ boolean getntrees (analdef *adef) { /* getntrees */ if (fscanf(INFILE, "%d", &(adef->numutrees)) != 1 || findch('\n') == EOF) { printf("ERROR: Problem reading number of user trees\n"); return FALSE; } if (adef->nkeep == 0) adef->nkeep = adef->numutrees; return TRUE; } /* getntrees */ boolean getinput (analdef *adef, rawdata *rdta, cruncheddata *cdta, tree *tr) { /* getinput */ if (! getnums(rdta)) return FALSE; if (! getoptions(adef, rdta, cdta, tr)) return FALSE; if (! adef->empf && ! getbasefreqs(rdta)) return FALSE; if (! getyspace(rdta)) return FALSE; if (! setupTree(tr, rdta->sites)) return FALSE; if (! getdata(adef, rdta, tr)) return FALSE; if (adef->usertree && ! getntrees(adef)) return FALSE; return TRUE; } /* getinput */ void makeboot (analdef *adef, rawdata *rdta, cruncheddata *cdta) { /* makeboot */ int i, j, nonzero; double randum(); nonzero = 0; for (i = 1; i <= rdta->sites; i++) if (rdta->wgt[i] > 0) nonzero++; for (j = 1; j <= nonzero; j++) cdta->aliaswgt[j] = 0; for (j = 1; j <= nonzero; j++) cdta->aliaswgt[(int) (nonzero*randum(& adef->boot)) + 1]++; j = 0; cdta->wgtsum = 0; for (i = 1; i <= rdta->sites; i++) { if (rdta->wgt[i] > 0) cdta->wgtsum += (rdta->wgt2[i] = rdta->wgt[i] * cdta->aliaswgt[++j]); else rdta->wgt2[i] = 0; } } /* makeboot */ void sitesort (rawdata *rdta, cruncheddata *cdta) /* Shell sort keeping sites with identical residues and weights in * the original order (i.e., a stable sort). * The index created in cdta->alias is 1 based. The * sitecombcrunch routine packs it to a 0 based index. */ { /* sitesort */ int gap, i, j, jj, jg, k, n, nsp; int *index, *category; boolean flip, tied; yType **data; index = cdta->alias; category = rdta->sitecat; data = rdta->y; n = rdta->sites; nsp = rdta->numsp; for (gap = n / 2; gap > 0; gap /= 2) { for (i = gap + 1; i <= n; i++) { j = i - gap; do { jj = index[j]; jg = index[j+gap]; flip = (category[jj] > category[jg]); tied = (category[jj] == category[jg]); for (k = 1; (k <= nsp) && tied; k++) { flip = (data[k][jj] > data[k][jg]); tied = (data[k][jj] == data[k][jg]); } if (flip) { index[j] = jg; index[j+gap] = jj; j -= gap; } } while (flip && (j > 0)); } /* for (i ... */ } /* for (gap ... */ } /* sitesort */ void sitecombcrunch (rawdata *rdta, cruncheddata *cdta) /* combine sites that have identical patterns (and nonzero weight) */ { /* sitecombcrunch */ int i, sitei, j, sitej, k; boolean tied; i = 0; cdta->alias[0] = cdta->alias[1]; cdta->aliaswgt[0] = 0; for (j = 1; j <= rdta->sites; j++) { sitei = cdta->alias[i]; sitej = cdta->alias[j]; tied = (rdta->sitecat[sitei] == rdta->sitecat[sitej]); for (k = 1; tied && (k <= rdta->numsp); k++) tied = (rdta->y[k][sitei] == rdta->y[k][sitej]); if (tied) { cdta->aliaswgt[i] += rdta->wgt2[sitej]; } else { if (cdta->aliaswgt[i] > 0) i++; cdta->aliaswgt[i] = rdta->wgt2[sitej]; cdta->alias[i] = sitej; } } cdta->endsite = i; if (cdta->aliaswgt[i] > 0) cdta->endsite++; } /* sitecombcrunch */ boolean makeweights (analdef *adef, rawdata *rdta, cruncheddata *cdta) /* make up weights vector to avoid duplicate computations */ { /* makeweights */ int i; if (adef->boot) makeboot(adef, rdta, cdta); else for (i = 1; i <= rdta->sites; i++) rdta->wgt2[i] = rdta->wgt[i]; for (i = 1; i <= rdta->sites; i++) cdta->alias[i] = i; sitesort(rdta, cdta); sitecombcrunch(rdta, cdta); printf("Total weight of positions in analysis = %d\n", cdta->wgtsum); printf("There are %d distinct data patterns (columns)\n\n", cdta->endsite); return TRUE; } /* makeweights */ boolean makevalues (rawdata *rdta, cruncheddata *cdta) /* set up fractional likelihoods at tips */ { /* makevalues */ double temp, wtemp; int i, j; for (i = 1; i <= rdta->numsp; i++) { /* Pack and move tip data */ for (j = 0; j < cdta->endsite; j++) { rdta->y[i-1][j] = rdta->y[i][cdta->alias[j]]; } } for (j = 0; j < cdta->endsite; j++) { cdta->patcat[j] = i = rdta->sitecat[cdta->alias[j]]; cdta->patrat[j] = temp = rdta->catrat[i]; cdta->wr[j] = wtemp = temp * cdta->aliaswgt[j]; cdta->wr2[j] = temp * wtemp; } return TRUE; } /* makevalues */ boolean empiricalfreqs (rawdata *rdta, cruncheddata *cdta) /* Get empirical base frequencies from the data */ { /* empiricalfreqs */ double sum, suma, sumc, sumg, sumt, wj, fa, fc, fg, ft; int i, j, k, code; yType *yptr; rdta->freqa = 0.25; rdta->freqc = 0.25; rdta->freqg = 0.25; rdta->freqt = 0.25; for (k = 1; k <= 8; k++) { suma = 0.0; sumc = 0.0; sumg = 0.0; sumt = 0.0; for (i = 0; i < rdta->numsp; i++) { yptr = rdta->y[i]; for (j = 0; j < cdta->endsite; j++) { code = *yptr++; fa = rdta->freqa * ( code & 1); fc = rdta->freqc * ((code >> 1) & 1); fg = rdta->freqg * ((code >> 2) & 1); ft = rdta->freqt * ((code >> 3) & 1); wj = cdta->aliaswgt[j] / (fa + fc + fg + ft); suma += wj * fa; sumc += wj * fc; sumg += wj * fg; sumt += wj * ft; } } sum = suma + sumc + sumg + sumt; rdta->freqa = suma / sum; rdta->freqc = sumc / sum; rdta->freqg = sumg / sum; rdta->freqt = sumt / sum; } return TRUE; } /* empiricalfreqs */ void reportfreqs (analdef *adef, rawdata *rdta) { /* reportfreqs */ double suma, sumb; if (adef->empf) printf("Empirical "); printf("Base Frequencies:\n\n"); printf(" A %10.5f\n", rdta->freqa); printf(" C %10.5f\n", rdta->freqc); printf(" G %10.5f\n", rdta->freqg); printf(" T(U) %10.5f\n\n", rdta->freqt); rdta->freqr = rdta->freqa + rdta->freqg; rdta->invfreqr = 1.0/rdta->freqr; rdta->freqar = rdta->freqa * rdta->invfreqr; rdta->freqgr = rdta->freqg * rdta->invfreqr; rdta->freqy = rdta->freqc + rdta->freqt; rdta->invfreqy = 1.0/rdta->freqy; rdta->freqcy = rdta->freqc * rdta->invfreqy; rdta->freqty = rdta->freqt * rdta->invfreqy; printf("Transition/transversion ratio = %10.6f\n\n", rdta->ttratio); suma = rdta->ttratio*rdta->freqr*rdta->freqy - (rdta->freqa*rdta->freqg + rdta->freqc*rdta->freqt); sumb = rdta->freqa*rdta->freqgr + rdta->freqc*rdta->freqty; rdta->xi = suma/(suma+sumb); rdta->xv = 1.0 - rdta->xi; if (rdta->xi <= 0.0) { printf("WARNING: This transition/transversion ratio\n"); printf(" is impossible with these base frequencies!\n"); printf("Transition/transversion parameter reset\n\n"); rdta->xi = 0.000001; rdta->xv = 1.0 - rdta->xi; rdta->ttratio = (sumb * rdta->xi / rdta->xv + rdta->freqa * rdta->freqg + rdta->freqc * rdta->freqt) / (rdta->freqr * rdta->freqy); printf("Transition/transversion ratio = %10.6f\n\n", rdta->ttratio); } printf("(Transition/transversion parameter = %10.6f)\n\n", rdta->xi/rdta->xv); rdta->fracchange = 2.0 * rdta->xi * (rdta->freqa * rdta->freqgr + rdta->freqc * rdta->freqty) + rdta->xv * (1.0 - rdta->freqa * rdta->freqa - rdta->freqc * rdta->freqc - rdta->freqg * rdta->freqg - rdta->freqt * rdta->freqt); } /* reportfreqs */ boolean linkdata2tree (rawdata *rdta, cruncheddata *cdta, tree *tr) /* Link data array to the tree tips */ { /* linkdata2tree */ int i; for (i = 1; i <= tr->mxtips; i++) { /* Associate data with tips */ tr->nodep[i]->tip = &(rdta->y[i-1][0]); } tr->rdta = rdta; tr->cdta = cdta; return TRUE; } /* linkdata2tree */ xarray *setupxarray (int npat) { /* setupxarray */ xarray *x; likelivector *data; x = (xarray *) Malloc(sizeof(xarray)); if (x) { data = (likelivector *) Malloc(npat * sizeof(likelivector)); if (data) { x->lv = data; x->prev = x->next = x; x->owner = (node *) NULL; } else { Free(x); return (xarray *) NULL; } } return x; } /* setupxarray */ boolean linkxarray (int req, int min, int npat, xarray **freexptr, xarray **usedxptr) /* Link a set of xarrays */ { /* linkxarray */ xarray *first, *prev, *x; int i; first = prev = (xarray *) NULL; i = 0; do { x = setupxarray(npat); if (x) { if (! first) first = x; else { prev->next = x; x->prev = prev; } prev = x; i++; } else { printf("ERROR: Failure to get requested xarray memory\n"); if (i < min) return FALSE; } } while ((i < req) && x); if (first) { first->prev = prev; prev->next = first; } *freexptr = first; *usedxptr = (xarray *) NULL; return TRUE; } /* linkxarray */ boolean setupnodex (tree *tr) { /* setupnodex */ nodeptr p; int i; for (i = tr->mxtips + 1; (i <= 2*(tr->mxtips) - 2); i++) { p = tr->nodep[i]; if (! (p->x = setupxarray(tr->cdta->endsite))) { printf("ERROR: Failure to get internal node xarray memory\n"); return FALSE; } } return TRUE; } /* setupnodex */ xarray *getxtip (nodeptr p) { /* getxtip */ xarray *new; boolean splice; if (! p) return (xarray *) NULL; splice = FALSE; if (p->x) { /* array is there; move to tail of list */ new = p->x; if (new == new->prev) ; /* linked to self; leave it */ else if (new == usedxtip) usedxtip = usedxtip->next; /* at head */ else if (new == usedxtip->prev) ; /* already at tail */ else { /* move to tail of list */ new->prev->next = new->next; new->next->prev = new->prev; splice = TRUE; } } else if (freextip) { /* take from unused list */ p->x = new = freextip; new->owner = p; if (new->prev != new) { /* not only member of freelist */ new->prev->next = new->next; new->next->prev = new->prev; freextip = new->next; } else freextip = (xarray *) NULL; splice = TRUE; } else if (usedxtip) { /* take from head of used list */ usedxtip->owner->x = (xarray *) NULL; p->x = new = usedxtip; new->owner = p; usedxtip = usedxtip->next; } else { printf("ERROR: Unable to locate memory for tip %d.\n", p->number); return (xarray *) NULL; } if (splice) { if (usedxtip) { /* list is not empty */ usedxtip->prev->next = new; new->prev = usedxtip->prev; usedxtip->prev = new; new->next = usedxtip; } else usedxtip = new->prev = new->next = new; } return new; } /* getxtip */ xarray *getxnode (nodeptr p) /* Ensure that internal node p has memory */ { /* getxnode */ nodeptr s; if (! (p->x)) { /* Move likelihood array on this node to sector p */ if ((s = p->next)->x || (s = s->next)->x) { p->x = s->x; s->x = (xarray *) NULL; } else { printf("ERROR: Unable to locate memory at node %d.\n", p->number); exit(1); } } return p->x; } /* getxnode */ boolean newview (tree *tr, nodeptr p) /* Update likelihoods at node */ { /* newview */ double zq, lzq, xvlzq, zr, lzr, xvlzr; nodeptr q, r; likelivector *lp, *lq, *lr; int i; if (p->tip) { /* Make sure that data are at tip */ likelivector *l; int code; yType *yptr; if (p->x) return TRUE; /* They are already there */ if (! getxtip(p)) return FALSE; /* They are not, so get memory */ l = p->x->lv; /* Pointer to first likelihood vector value */ yptr = p->tip; /* Pointer to first nucleotide datum */ for (i = 0; i < tr->cdta->endsite; i++) { code = *yptr++; l->a = code & 1; l->c = (code >> 1) & 1; l->g = (code >> 2) & 1; l->t = (code >> 3) & 1; l->exp = 0; l++; } return TRUE; } /* Internal node needs update */ q = p->next->back; r = p->next->next->back; while ((! p->x) || (! q->x) || (! r->x)) { if (! q->x) if (! newview(tr, q)) return FALSE; if (! r->x) if (! newview(tr, r)) return FALSE; if (! p->x) if (! getxnode(p)) return FALSE; } lp = p->x->lv; lq = q->x->lv; zq = q->z; lzq = (zq > zmin) ? log(zq) : log(zmin); xvlzq = tr->rdta->xv * lzq; lr = r->x->lv; zr = r->z; lzr = (zr > zmin) ? log(zr) : log(zmin); xvlzr = tr->rdta->xv * lzr; { double zzqtable[maxcategories+1], zvqtable[maxcategories+1], zzrtable[maxcategories+1], zvrtable[maxcategories+1], *zzqptr, *zvqptr, *zzrptr, *zvrptr, *rptr; double fxqr, fxqy, fxqn, sumaq, sumgq, sumcq, sumtq, fxrr, fxry, fxrn, ki, tempi, tempj; int *cptr; # ifdef Vectorize double zzq[maxpatterns], zvq[maxpatterns], zzr[maxpatterns], zvr[maxpatterns]; int cat; # else double zzq, zvq, zzr, zvr; int cat; # endif rptr = &(tr->rdta->catrat[1]); zzqptr = &(zzqtable[1]); zvqptr = &(zvqtable[1]); zzrptr = &(zzrtable[1]); zvrptr = &(zvrtable[1]); # ifdef Vectorize # pragma IVDEP # endif for (i = 1; i <= tr->rdta->categs; i++) { /* exps for each category */ ki = *rptr++; *zzqptr++ = exp(ki * lzq); *zvqptr++ = exp(ki * xvlzq); *zzrptr++ = exp(ki * lzr); *zvrptr++ = exp(ki * xvlzr); } cptr = &(tr->cdta->patcat[0]); # ifdef Vectorize # pragma IVDEP for (i = 0; i < tr->cdta->endsite; i++) { cat = *cptr++; zzq[i] = zzqtable[cat]; zvq[i] = zvqtable[cat]; zzr[i] = zzrtable[cat]; zvr[i] = zvrtable[cat]; } # pragma IVDEP for (i = 0; i < tr->cdta->endsite; i++) { fxqr = tr->rdta->freqa * lq->a + tr->rdta->freqg * lq->g; fxqy = tr->rdta->freqc * lq->c + tr->rdta->freqt * lq->t; fxqn = fxqr + fxqy; tempi = fxqr * tr->rdta->invfreqr; tempj = zvq[i] * (tempi-fxqn) + fxqn; sumaq = zzq[i] * (lq->a - tempi) + tempj; sumgq = zzq[i] * (lq->g - tempi) + tempj; tempi = fxqy * tr->rdta->invfreqy; tempj = zvq[i] * (tempi-fxqn) + fxqn; sumcq = zzq[i] * (lq->c - tempi) + tempj; sumtq = zzq[i] * (lq->t - tempi) + tempj; fxrr = tr->rdta->freqa * lr->a + tr->rdta->freqg * lr->g; fxry = tr->rdta->freqc * lr->c + tr->rdta->freqt * lr->t; fxrn = fxrr + fxry; tempi = fxrr * tr->rdta->invfreqr; tempj = zvr[i] * (tempi-fxrn) + fxrn; lp->a = sumaq * (zzr[i] * (lr->a - tempi) + tempj); lp->g = sumgq * (zzr[i] * (lr->g - tempi) + tempj); tempi = fxry * tr->rdta->invfreqy; tempj = zvr[i] * (tempi-fxrn) + fxrn; lp->c = sumcq * (zzr[i] * (lr->c - tempi) + tempj); lp->t = sumtq * (zzr[i] * (lr->t - tempi) + tempj); lp->exp = lq->exp + lr->exp; if (lp->a < minlikelihood && lp->g < minlikelihood && lp->c < minlikelihood && lp->t < minlikelihood) { lp->a *= twotothe256; lp->g *= twotothe256; lp->c *= twotothe256; lp->t *= twotothe256; lp->exp += 1; } lp++; lq++; lr++; } # else /* Not Vectorize */ for (i = 0; i < tr->cdta->endsite; i++) { cat = *cptr++; zzq = zzqtable[cat]; zvq = zvqtable[cat]; fxqr = tr->rdta->freqa * lq->a + tr->rdta->freqg * lq->g; fxqy = tr->rdta->freqc * lq->c + tr->rdta->freqt * lq->t; fxqn = fxqr + fxqy; tempi = fxqr * tr->rdta->invfreqr; tempj = zvq * (tempi-fxqn) + fxqn; sumaq = zzq * (lq->a - tempi) + tempj; sumgq = zzq * (lq->g - tempi) + tempj; tempi = fxqy * tr->rdta->invfreqy; tempj = zvq * (tempi-fxqn) + fxqn; sumcq = zzq * (lq->c - tempi) + tempj; sumtq = zzq * (lq->t - tempi) + tempj; zzr = zzrtable[cat]; zvr = zvrtable[cat]; fxrr = tr->rdta->freqa * lr->a + tr->rdta->freqg * lr->g; fxry = tr->rdta->freqc * lr->c + tr->rdta->freqt * lr->t; fxrn = fxrr + fxry; tempi = fxrr * tr->rdta->invfreqr; tempj = zvr * (tempi-fxrn) + fxrn; lp->a = sumaq * (zzr * (lr->a - tempi) + tempj); lp->g = sumgq * (zzr * (lr->g - tempi) + tempj); tempi = fxry * tr->rdta->invfreqy; tempj = zvr * (tempi-fxrn) + fxrn; lp->c = sumcq * (zzr * (lr->c - tempi) + tempj); lp->t = sumtq * (zzr * (lr->t - tempi) + tempj); lp->exp = lq->exp + lr->exp; if (lp->a < minlikelihood && lp->g < minlikelihood && lp->c < minlikelihood && lp->t < minlikelihood) { lp->a *= twotothe256; lp->g *= twotothe256; lp->c *= twotothe256; lp->t *= twotothe256; lp->exp += 1; } lp++; lq++; lr++; } # endif /* Vectorize or not */ return TRUE; } } /* newview */ double evaluate (tree *tr, nodeptr p) { /* evaluate */ double sum, z, lz, xvlz, ki, fxpa, fxpc, fxpg, fxpt, fxpr, fxpy, fxqr, fxqy, suma, sumb, sumc, term; # ifdef Vectorize double zz[maxpatterns], zv[maxpatterns]; # else double zz, zv; # endif double zztable[maxcategories+1], zvtable[maxcategories+1], *zzptr, *zvptr; double *log_f, *rptr; likelivector *lp, *lq; nodeptr q; int cat, *cptr, i, *wptr; q = p->back; while ((! p->x) || (! q->x)) { if (! (p->x)) if (! newview(tr, p)) return badEval; if (! (q->x)) if (! newview(tr, q)) return badEval; } lp = p->x->lv; lq = q->x->lv; z = p->z; if (z < zmin) z = zmin; lz = log(z); xvlz = tr->rdta->xv * lz; rptr = &(tr->rdta->catrat[1]); zzptr = &(zztable[1]); zvptr = &(zvtable[1]); # ifdef Vectorize # pragma IVDEP # endif for (i = 1; i <= tr->rdta->categs; i++) { ki = *rptr++; *zzptr++ = exp(ki * lz); *zvptr++ = exp(ki * xvlz); } wptr = &(tr->cdta->aliaswgt[0]); cptr = &(tr->cdta->patcat[0]); log_f = tr->log_f; tr->log_f_valid = tr->cdta->endsite; sum = 0.0; # ifdef Vectorize # pragma IVDEP for (i = 0; i < tr->cdta->endsite; i++) { cat = *cptr++; zz[i] = zztable[cat]; zv[i] = zvtable[cat]; } # pragma IVDEP for (i = 0; i < tr->cdta->endsite; i++) { fxpa = tr->rdta->freqa * lp->a; fxpg = tr->rdta->freqg * lp->g; fxpc = tr->rdta->freqc * lp->c; fxpt = tr->rdta->freqt * lp->t; suma = fxpa * lq->a + fxpc * lq->c + fxpg * lq->g + fxpt * lq->t; fxqr = tr->rdta->freqa * lq->a + tr->rdta->freqg * lq->g; fxqy = tr->rdta->freqc * lq->c + tr->rdta->freqt * lq->t; fxpr = fxpa + fxpg; fxpy = fxpc + fxpt; sumc = (fxpr + fxpy) * (fxqr + fxqy); sumb = fxpr * fxqr * tr->rdta->invfreqr + fxpy * fxqy * tr->rdta->invfreqy; suma -= sumb; sumb -= sumc; term = log(zz[i] * suma + zv[i] * sumb + sumc) + (lp->exp + lq->exp)*log(minlikelihood); sum += *wptr++ * term; *log_f++ = term; lp++; lq++; } # else /* Not Vectorize */ for (i = 0; i < tr->cdta->endsite; i++) { cat = *cptr++; zz = zztable[cat]; zv = zvtable[cat]; fxpa = tr->rdta->freqa * lp->a; fxpg = tr->rdta->freqg * lp->g; fxpc = tr->rdta->freqc * lp->c; fxpt = tr->rdta->freqt * lp->t; suma = fxpa * lq->a + fxpc * lq->c + fxpg * lq->g + fxpt * lq->t; fxqr = tr->rdta->freqa * lq->a + tr->rdta->freqg * lq->g; fxqy = tr->rdta->freqc * lq->c + tr->rdta->freqt * lq->t; fxpr = fxpa + fxpg; fxpy = fxpc + fxpt; sumc = (fxpr + fxpy) * (fxqr + fxqy); sumb = fxpr * fxqr * tr->rdta->invfreqr + fxpy * fxqy * tr->rdta->invfreqy; suma -= sumb; sumb -= sumc; term = log(zz * suma + zv * sumb + sumc) + (lp->exp + lq->exp)*log(minlikelihood); /* printf("evaluate: %le\n", term); */ sum += *wptr++ * term; *log_f++ = term; lp++; lq++; } # endif /* Vectorize or not */ tr->likelihood = sum; return sum; } /* evaluate */ double makenewz (tree *tr, nodeptr p, nodeptr q, double z0, int maxiter) { /* makenewz */ likelivector *lp, *lq; double *abi, *bci, *sumci, *abptr, *bcptr, *sumcptr; double dlnLidlz, dlnLdlz, d2lnLdlz2, z, zprev, zstep, lz, xvlz, ki, suma, sumb, sumc, ab, bc, inv_Li, t1, t2, fx1a, fx1c, fx1g, fx1t, fx1r, fx1y, fx2r, fx2y; double zztable[maxcategories+1], zvtable[maxcategories+1], *zzptr, *zvptr; double *rptr, *wrptr, *wr2ptr; int cat, *cptr, i, curvatOK; while ((! p->x) || (! q->x)) { if (! (p->x)) if (! newview(tr, p)) return badZ; if (! (q->x)) if (! newview(tr, q)) return badZ; } lp = p->x->lv; lq = q->x->lv; { unsigned scratch_size; scratch_size = sizeof(double) * tr->cdta->endsite; if ((abi = (double *) Malloc(scratch_size)) && (bci = (double *) Malloc(scratch_size)) && (sumci = (double *) Malloc(scratch_size))) ; else { printf("ERROR: makenewz unable to obtain space for arrays\n"); return badZ; } } abptr = abi; bcptr = bci; sumcptr = sumci; # ifdef Vectorize # pragma IVDEP # endif for (i = 0; i < tr->cdta->endsite; i++) { fx1a = tr->rdta->freqa * lp->a; fx1g = tr->rdta->freqg * lp->g; fx1c = tr->rdta->freqc * lp->c; fx1t = tr->rdta->freqt * lp->t; suma = fx1a * lq->a + fx1c * lq->c + fx1g * lq->g + fx1t * lq->t; fx2r = tr->rdta->freqa * lq->a + tr->rdta->freqg * lq->g; fx2y = tr->rdta->freqc * lq->c + tr->rdta->freqt * lq->t; fx1r = fx1a + fx1g; fx1y = fx1c + fx1t; *sumcptr++ = sumc = (fx1r + fx1y) * (fx2r + fx2y); sumb = fx1r * fx2r * tr->rdta->invfreqr + fx1y * fx2y * tr->rdta->invfreqy; *abptr++ = suma - sumb; *bcptr++ = sumb - sumc; lp++; lq++; } z = z0; do { zprev = z; zstep = (1.0 - zmax) * z + zmin; curvatOK = FALSE; do { if (z < zmin) z = zmin; else if (z > zmax) z = zmax; lz = log(z); xvlz = tr->rdta->xv * lz; rptr = &(tr->rdta->catrat[1]); zzptr = &(zztable[1]); zvptr = &(zvtable[1]); # ifdef Vectorize # pragma IVDEP # endif for (i = 1; i <= tr->rdta->categs; i++) { ki = *rptr++; *zzptr++ = exp(ki * lz); *zvptr++ = exp(ki * xvlz); } abptr = abi; bcptr = bci; sumcptr = sumci; cptr = &(tr->cdta->patcat[0]); wrptr = &(tr->cdta->wr[0]); wr2ptr = &(tr->cdta->wr2[0]); dlnLdlz = 0.0; /* = d(ln(likelihood))/d(lz) */ d2lnLdlz2 = 0.0; /* = d2(ln(likelihood))/d(lz)2 */ # ifdef Vectorize # pragma IVDEP # endif for (i = 0; i < tr->cdta->endsite; i++) { cat = *cptr++; /* ratecategory(i) */ ab = *abptr++ * zztable[cat]; bc = *bcptr++ * zvtable[cat]; sumc = *sumcptr++; inv_Li = 1.0/(ab + bc + sumc); t1 = ab * inv_Li; t2 = tr->rdta->xv * bc * inv_Li; dlnLidlz = t1 + t2; dlnLdlz += *wrptr++ * dlnLidlz; d2lnLdlz2 += *wr2ptr++ * (t1 + tr->rdta->xv * t2 - dlnLidlz * dlnLidlz); } if ((d2lnLdlz2 >= 0.0) && (z < zmax)) zprev = z = 0.37 * z + 0.63; /* Bad curvature, shorten branch */ else curvatOK = TRUE; } while (! curvatOK); if (d2lnLdlz2 < 0.0) { z *= exp(-dlnLdlz / d2lnLdlz2); if (z < zmin) z = zmin; if (z > 0.25 * zprev + 0.75) /* Limit steps toward z = 1.0 */ z = 0.25 * zprev + 0.75; } if (z > zmax) z = zmax; } while ((--maxiter > 0) && (ABS(z - zprev) > zstep)); Free(abi); Free(bci); Free(sumci); /* printf("makenewz: %le\n", z); */ return z; } /* makenewz */ boolean update (tree *tr, nodeptr p) { /* update */ nodeptr q; double z0, z; q = p->back; z0 = q->z; if ((z = makenewz(tr, p, q, z0, newzpercycle)) == badZ) return FALSE; p->z = q->z = z; if (ABS(z - z0) > deltaz) tr->smoothed = FALSE; return TRUE; } /* update */ boolean smooth (tree *tr, nodeptr p) { /* smooth */ nodeptr q; if (! update(tr, p)) return FALSE; /* Adjust branch */ if (! p->tip) { /* Adjust descendants */ q = p->next; while (q != p) { if (! smooth(tr, q->back)) return FALSE; q = q->next; } # if ReturnSmoothedView if (! newview(tr, p)) return FALSE; # endif } return TRUE; } /* smooth */ boolean smoothTree (tree *tr, int maxtimes) { /* smoothTree */ nodeptr p, q; p = tr->start; while (--maxtimes >= 0) { tr->smoothed = TRUE; if (! smooth(tr, p->back)) return FALSE; if (! p->tip) { q = p->next; while (q != p) { if (! smooth(tr, q->back)) return FALSE; q = q->next; } } if (tr->smoothed) break; } return TRUE; } /* smoothTree */ boolean localSmooth (tree *tr, nodeptr p, int maxtimes) { /* localSmooth -- Smooth branches around p */ nodeptr q; if (p->tip) return FALSE; /* Should be an error */ while (--maxtimes >= 0) { tr->smoothed = TRUE; q = p; do { if (! update(tr, q)) return FALSE; q = q->next; } while (q != p); if (tr->smoothed) break; } tr->smoothed = FALSE; /* Only smooth locally */ return TRUE; } /* localSmooth */ void hookup (nodeptr p, nodeptr q, double z) { /* hookup */ p->back = q; q->back = p; p->z = q->z = z; } /* hookup */ /* Insert node p into branch q <-> q->back */ boolean insert (tree *tr, nodeptr p, nodeptr q, boolean glob) /* glob -- Smooth tree globally? */ /* q /. add/ . / . pn . s ---- p .remove pnn . \ . add\ . \. pn = p->next; r pnn = p->next->next; */ { /* insert */ nodeptr r, s; r = q->back; s = p->back; # if BestInsertAverage && ! Master { double zqr, zqs, zrs, lzqr, lzqs, lzrs, lzsum, lzq, lzr, lzs, lzmax; if ((zqr = makenewz(tr, q, r, q->z, iterations)) == badZ) return FALSE; if ((zqs = makenewz(tr, q, s, defaultz, iterations)) == badZ) return FALSE; if ((zrs = makenewz(tr, r, s, defaultz, iterations)) == badZ) return FALSE; lzqr = (zqr > zmin) ? log(zqr) : log(zmin); /* long branches */ lzqs = (zqs > zmin) ? log(zqs) : log(zmin); lzrs = (zrs > zmin) ? log(zrs) : log(zmin); lzsum = 0.5 * (lzqr + lzqs + lzrs); lzq = lzsum - lzrs; lzr = lzsum - lzqs; lzs = lzsum - lzqr; lzmax = log(zmax); if (lzq > lzmax) {lzq = lzmax; lzr = lzqr; lzs = lzqs;} /* short */ else if (lzr > lzmax) {lzr = lzmax; lzq = lzqr; lzs = lzrs;} else if (lzs > lzmax) {lzs = lzmax; lzq = lzqs; lzr = lzrs;} hookup(p->next, q, exp(lzq)); hookup(p->next->next, r, exp(lzr)); hookup(p, s, exp(lzs)); } # else { double z; z = sqrt(q->z); hookup(p->next, q, z); hookup(p->next->next, r, z); } # endif if (! newview(tr, p)) return FALSE; /* So that p is valid at update */ tr->opt_level = 0; # if ! Master /* Smoothings are done by slave */ if (glob) { /* Smooth whole tree */ if (! smoothTree(tr, smoothings)) return FALSE; } else { /* Smooth locale of p */ if (! localSmooth(tr, p, smoothings)) return FALSE; } # else tr->likelihood = unlikely; # endif return TRUE; } /* insert */ nodeptr removeNode (tree *tr, nodeptr p) /* q .| remove. | . | pn | s ---- p |add pnn | . | remove. | .| pn = p->next; r pnn = p->next->next; */ /* remove p and return where it was */ { /* removeNode */ double zqr; nodeptr q, r; q = p->next->back; r = p->next->next->back; zqr = q->z * r->z; # if ! Master if ((zqr = makenewz(tr, q, r, zqr, iterations)) == badZ) return (node *) NULL; # endif hookup(q, r, zqr); p->next->next->back = p->next->back = (node *) NULL; return q; } /* removeNode */ boolean initrav (tree *tr, nodeptr p) { /* initrav */ nodeptr q; if (! p->tip) { q = p->next; do { if (! initrav(tr, q->back)) return FALSE; q = q->next; } while (q != p); if (! newview(tr, p)) return FALSE; } return TRUE; } /* initrav */ nodeptr buildNewTip (tree *tr, nodeptr p) { /* buildNewTip */ nodeptr q; q = tr->nodep[(tr->nextnode)++]; hookup(p, q, defaultz); return q; } /* buildNewTip */ boolean buildSimpleTree (tree *tr, int ip, int iq, int ir) { /* buildSimpleTree */ /* p, q and r are tips meeting at s */ nodeptr p, s; int i; i = MIN(ip, iq); if (ir < i) i = ir; tr->start = tr->nodep[i]; tr->ntips = 3; p = tr->nodep[ip]; hookup(p, tr->nodep[iq], defaultz); s = buildNewTip(tr, tr->nodep[ir]); return insert(tr, s, p, FALSE); /* Smoothing is local to s */ } /* buildSimpleTree */ char * strchr (char *str, int chr) { /* strchr */ int c; while (c = *str) {if (c == chr) return str; str++;} return (char *) NULL; } /* strchr */ char * strstr (char *str1, char *str2) { /* strstr */ char *s1, *s2; int c; while (*(s1 = str1)) { s2 = str2; do { if (! (c = *s2++)) return str1; } while (*s1++ == c); str1++; } return (char *) NULL; } /* strstr */ boolean readKeyValue (char *string, char *key, char *format, void *value) { /* readKeyValue */ if (! (string = strstr(string, key))) return FALSE; string += strlen(key); if (! (string = strchr(string, '='))) return FALSE; string++; return sscanf(string, format, value); /* 1 if read, otherwise 0 */ } /* readKeyValue */ #if Master || Slave double str_readTreeLikelihood (char *treestr) { /* str_readTreeLikelihood */ double lk1; char *com, *com_end; boolean readKeyValue(); if ((com = strchr(treestr, '[')) && (com < strchr(treestr, '(')) && (com_end = strchr(com, ']'))) { com++; *com_end = 0; if (readKeyValue(com, likelihood_key, "%lg", (void *) &(lk1))) { *com_end = ']'; return lk1; } } fprintf(stderr, "ERROR reading likelihood in receiveTree\n"); return badEval; } /* str_readTreeLikelihood */ boolean sendTree (comm_block *comm, tree *tr) { /* sendTree */ char *treestr; char *treeString(); # if Master void sendTreeNum(); # endif comm->done_flag = tr->likelihood > 0.0; if (comm->done_flag) write_comm_msg(comm, NULL); else { treestr = (char *) Malloc((tr->ntips * (nmlngth+32)) + 256); if (! treestr) { fprintf(stderr, "sendTree: Malloc failure\n"); return 0; } # if Master if (send_ahead >= MAX_SEND_AHEAD) { double new_likelihood; int n_to_get; n_to_get = (send_ahead+1)/2; sendTreeNum(n_to_get); send_ahead -= n_to_get; read_comm_msg(& comm_slave, treestr); new_likelihood = str_readTreeLikelihood(treestr); if (new_likelihood == badEval) return FALSE; if (! best_tr_recv || (new_likelihood > best_lk_recv)) { if (best_tr_recv) Free(best_tr_recv); best_tr_recv = Malloc(strlen(treestr) + 1); strcpy(best_tr_recv, treestr); best_lk_recv = new_likelihood; } } send_ahead++; # endif /* End #if Master */ REPORT_SEND_TREE; (void) treeString(treestr, tr, tr->start->back, 1); write_comm_msg(comm, treestr); Free(treestr); } return TRUE; } /* sendTree */ boolean receiveTree (comm_block *comm, tree *tr) { /* receiveTree */ char *treestr; boolean status; boolean str_treeReadLen(); treestr = (char *) Malloc((tr->ntips * (nmlngth+32)) + 256); if (! treestr) { fprintf(stderr, "receiveTree: Malloc failure\n"); return 0; } read_comm_msg(comm, treestr); if (comm->done_flag) { tr->likelihood = 1.0; status = TRUE; } else { # if Master if (best_tr_recv) { if (str_readTreeLikelihood(treestr) < best_lk_recv) { strcpy(treestr, best_tr_recv); /* Overwrite new tree with best */ } Free(best_tr_recv); best_tr_recv = NULL; } # endif /* End #if Master */ status = str_treeReadLen(treestr, tr); } Free(treestr); return status; } /* receiveTree */ void requestForWork (void) { /* requestForWork */ p4_send(DNAML_REQUEST, DNAML_DISPATCHER_ID, NULL, 0); } /* requestForWork */ #endif /* End #if Master || Slave */ #if Master void sendTreeNum(int n_to_get) { /* sendTreeNum */ char scr[512]; sprintf(scr, "%d", n_to_get); p4_send(DNAML_NUM_TREE, DNAML_MERGER_ID, scr, strlen(scr)+1); } /* sendTreeNum */ boolean getReturnedTrees (tree *tr, bestlist *bt, int n_tree_sent) /* n_tree_sent -- number of trees sent to slaves */ { /* getReturnedTrees */ void sendTreeNum(); boolean receiveTree(); sendTreeNum(send_ahead); send_ahead = 0; if (! receiveTree(& comm_slave, tr)) return FALSE; tr->smoothed = TRUE; (void) saveBestTree(bt, tr); return TRUE; } /* getReturnedTrees */ #endif void cacheZ (tree *tr) { /* cacheZ */ nodeptr p; int nodes; nodes = tr->mxtips + 3 * (tr->mxtips - 2); p = tr->nodep[1]; while (nodes-- > 0) {p->z0 = p->z; p++;} } /* cacheZ */ void restoreZ (tree *tr) { /* restoreZ */ nodeptr p; int nodes; nodes = tr->mxtips + 3 * (tr->mxtips - 2); p = tr->nodep[1]; while (nodes-- > 0) {p->z = p->z0; p++;} } /* restoreZ */ boolean testInsert (tree *tr, nodeptr p, nodeptr q, bestlist *bt, boolean fast) { /* testInsert */ double qz; nodeptr r; r = q->back; /* Save original connection */ qz = q->z; if (! insert(tr, p, q, ! fast)) return FALSE; # if ! Master if (evaluate(tr, fast ? p->next->next : tr->start) == badEval) return FALSE; (void) saveBestTree(bt, tr); # else /* Master */ tr->likelihood = unlikely; if (! sendTree(& comm_slave, tr)) return FALSE; # endif /* remove p from this branch */ hookup(q, r, qz); p->next->next->back = p->next->back = (nodeptr) NULL; if (! fast) { /* With fast add, other values are still OK */ restoreZ(tr); /* Restore branch lengths */ # if ! Master /* Regenerate x values */ if (! initrav(tr, p->back)) return FALSE; if (! initrav(tr, q)) return FALSE; if (! initrav(tr, r)) return FALSE; # endif } return TRUE; } /* testInsert */ int addTraverse (tree *tr, nodeptr p, nodeptr q, int mintrav, int maxtrav, bestlist *bt, boolean fast) { /* addTraverse */ int tested, newtested; tested = 0; if (--mintrav <= 0) { /* Moved minimum distance? */ if (! testInsert(tr, p, q, bt, fast)) return badRear; tested++; } if ((! q->tip) && (--maxtrav > 0)) { /* Continue traverse? */ newtested = addTraverse(tr, p, q->next->back, mintrav, maxtrav, bt, fast); if (newtested == badRear) return badRear; tested += newtested; newtested = addTraverse(tr, p, q->next->next->back, mintrav, maxtrav, bt, fast); if (newtested == badRear) return badRear; tested += newtested; } return tested; } /* addTraverse */ int rearrange (tree *tr, nodeptr p, int mintrav, int maxtrav, bestlist *bt) /* rearranges the tree, globally or locally */ { /* rearrange */ double p1z, p2z, q1z, q2z; nodeptr p1, p2, q, q1, q2; int tested, mintrav2, newtested; tested = 0; if (maxtrav < 1 || mintrav > maxtrav) return tested; /* Moving subtree forward in tree. */ if (! p->tip) { p1 = p->next->back; p2 = p->next->next->back; if (! p1->tip || ! p2->tip) { p1z = p1->z; p2z = p2->z; if (! removeNode(tr, p)) return badRear; cacheZ(tr); if (! p1->tip) { newtested = addTraverse(tr, p, p1->next->back, mintrav, maxtrav, bt, FALSE); if (newtested == badRear) return badRear; tested += newtested; newtested = addTraverse(tr, p, p1->next->next->back, mintrav, maxtrav, bt, FALSE); if (newtested == badRear) return badRear; tested += newtested; } if (! p2->tip) { newtested = addTraverse(tr, p, p2->next->back, mintrav, maxtrav, bt, FALSE); if (newtested == badRear) return badRear; tested += newtested; newtested = addTraverse(tr, p, p2->next->next->back, mintrav, maxtrav, bt, FALSE); if (newtested == badRear) return badRear; tested += newtested; } hookup(p->next, p1, p1z); /* Restore original tree */ hookup(p->next->next, p2, p2z); if (! (initrav(tr, tr->start) && initrav(tr, tr->start->back))) return badRear; } } /* if (! p->tip) */ /* Moving subtree backward in tree. Minimum move is 2 to avoid duplicates */ q = p->back; if (! q->tip && maxtrav > 1) { q1 = q->next->back; q2 = q->next->next->back; if (! q1->tip && (!q1->next->back->tip || !q1->next->next->back->tip) || ! q2->tip && (!q2->next->back->tip || !q2->next->next->back->tip)) { q1z = q1->z; q2z = q2->z; if (! removeNode(tr, q)) return badRear; cacheZ(tr); mintrav2 = mintrav > 2 ? mintrav : 2; if (! q1->tip) { newtested = addTraverse(tr, q, q1->next->back, mintrav2 , maxtrav, bt, FALSE); if (newtested == badRear) return badRear; tested += newtested; newtested = addTraverse(tr, q, q1->next->next->back, mintrav2 , maxtrav, bt, FALSE); if (newtested == badRear) return badRear; tested += newtested; } if (! q2->tip) { newtested = addTraverse(tr, q, q2->next->back, mintrav2 , maxtrav, bt, FALSE); if (newtested == badRear) return badRear; tested += newtested; newtested = addTraverse(tr, q, q2->next->next->back, mintrav2 , maxtrav, bt, FALSE); if (newtested == badRear) return badRear; tested += newtested; } hookup(q->next, q1, q1z); /* Restore original tree */ hookup(q->next->next, q2, q2z); if (! (initrav(tr, tr->start) && initrav(tr, tr->start->back))) return badRear; } } /* if (! q->tip && maxtrav > 1) */ /* Move other subtrees */ if (! p->tip) { newtested = rearrange(tr, p->next->back, mintrav, maxtrav, bt); if (newtested == badRear) return badRear; tested += newtested; newtested = rearrange(tr, p->next->next->back, mintrav, maxtrav, bt); if (newtested == badRear) return badRear; tested += newtested; } return tested; } /* rearrange */ FILE *fopen_pid (char *filenm, char *mode, char *name_pid) { /* fopen_pid */ (void) sprintf(name_pid, "%s.%d", filenm, getpid()); return fopen(name_pid, mode); } /* fopen_pid */ #if DeleteCheckpointFile void unlink_pid (char *filenm) { /* unlink_pid */ char scr[512]; (void) sprintf(scr, "%s.%d", filenm, getpid()); unlink(scr); } /* unlink_pid */ #endif void writeCheckpoint (tree *tr) { /* writeCheckpoint */ char filename[128]; FILE *checkpointf; void treeOut(); checkpointf = fopen_pid(checkpointname, "a", filename); if (checkpointf) { treeOut(checkpointf, tr, treeNewick); (void) fclose(checkpointf); } } /* writeCheckpoint */ node * findAnyTip(nodeptr p) { /* findAnyTip */ return p->tip ? p : findAnyTip(p->next->back); } /* findAnyTip */ boolean optimize (tree *tr, int maxtrav, bestlist *bt) { /* optimize */ nodeptr p; int mintrav, tested; if (tr->ntips < 4) return TRUE; writeCheckpoint(tr); /* checkpoint the starting tree */ if (maxtrav > tr->ntips - 3) maxtrav = tr->ntips - 3; if (maxtrav <= tr->opt_level) return TRUE; printf(" Doing %s rearrangements\n", (maxtrav == 1) ? "local" : (maxtrav < tr->ntips - 3) ? "regional" : "global"); /* loop while tree gets better */ do { (void) startOpt(bt, tr); mintrav = tr->opt_level + 1; /* rearrange must start from a tip or it will miss some trees */ p = findAnyTip(tr->start); tested = rearrange(tr, p->back, mintrav, maxtrav, bt); if (tested == badRear) return FALSE; # if Master if (! getReturnedTrees(tr, bt, tested)) return FALSE; # endif bt->numtrees += tested; (void) setOptLevel(bt, maxtrav); if (! recallBestTree(bt, 1, tr)) return FALSE; /* recover best tree */ printf(" Tested %d alternative trees\n", tested); if (bt->improved) { printf(" Ln Likelihood =%14.5f\n", tr->likelihood); } writeCheckpoint(tr); /* checkpoint the new tree */ } while (maxtrav > tr->opt_level); return TRUE; } /* optimize */ void coordinates (tree *tr, nodeptr p, double lengthsum, drawdata *tdptr) { /* coordinates */ /* establishes coordinates of nodes */ double x, z; nodeptr q, first, last; if (p->tip) { p->xcoord = NINT(over * lengthsum); p->ymax = p->ymin = p->ycoord = tdptr->tipy; tdptr->tipy += down; if (lengthsum > tdptr->tipmax) tdptr->tipmax = lengthsum; } else { q = p->next; do { z = q->z; if (z < zmin) z = zmin; x = lengthsum - tr->rdta->fracchange * log(z); coordinates(tr, q->back, x, tdptr); q = q->next; } while (p == tr->start->back ? q != p->next : q != p); first = p->next->back; q = p; while (q->next != p) q = q->next; last = q->back; p->xcoord = NINT(over * lengthsum); p->ycoord = (first->ycoord + last->ycoord)/2; p->ymin = first->ymin; p->ymax = last->ymax; } } /* coordinates */ void drawline (tree *tr, int i, double scale) /* draws one row of the tree diagram by moving up tree */ /* Modified to handle 1000 taxa, October 16, 1991 */ { /* drawline */ nodeptr p, q, r, first, last; int n, j, k, l, extra; boolean done; p = q = tr->start->back; extra = 0; if (i == p->ycoord) { k = q->number - tr->mxtips; for (j = k; j < 1000; j *= 10) putchar('-'); printf("%d", k); extra = 1; } else printf(" "); do { if (! p->tip) { r = p->next; done = FALSE; do { if ((i >= r->back->ymin) && (i <= r->back->ymax)) { q = r->back; done = TRUE; } r = r->next; } while (! done && (p == tr->start->back ? r != p->next : r != p)); first = p->next->back; r = p; while (r->next != p) r = r->next; last = r->back; if (p == tr->start->back) last = p->back; } done = (p->tip) || (p == q); n = NINT(scale*(q->xcoord - p->xcoord)); if ((n < 3) && (! q->tip)) n = 3; n -= extra; extra = 0; if ((q->ycoord == i) && (! done)) { if (p->ycoord != q->ycoord) putchar('+'); else putchar('-'); if (! q->tip) { k = q->number - tr->mxtips; l = n - 3; for (j = k; j < 100; j *= 10) l++; for (j = 1; j <= l; j++) putchar('-'); printf("%d", k); extra = 1; } else for (j = 1; j <= n-1; j++) putchar('-'); } else if (! p->tip) { if ((last->ycoord > i) && (first->ycoord < i) && (i != p->ycoord)) { putchar('!'); for (j = 1; j <= n-1; j++) putchar(' '); } else for (j = 1; j <= n; j++) putchar(' '); } else for (j = 1; j <= n; j++) putchar(' '); p = q; } while (! done); if ((p->ycoord == i) && p->tip) { printf(" %s", p->name); } putchar('\n'); } /* drawline */ void printTree (tree *tr, analdef *adef) /* prints out diagram of the tree */ { /* printTree */ drawdata tipdata; double scale; int i, imax; if (adef->trprint) { putchar('\n'); tipdata.tipy = 1; tipdata.tipmax = 0.0; coordinates(tr, tr->start->back, (double) 0.0, & tipdata); scale = 1.0 / tipdata.tipmax; imax = tipdata.tipy - down; for (i = 1; i <= imax; i++) drawline(tr, i, scale); printf("\nRemember: "); if (adef->root) printf("(although rooted by outgroup) "); printf("this is an unrooted tree!\n\n"); } } /* printTree */ double sigma (tree *tr, nodeptr p, double *sumlrptr) /* compute standard deviation */ { /* sigma */ likelivector *lp, *lq; double slope, sum, sumlr, z, zv, zz, lz, rat, suma, sumb, sumc, d2, d, li, temp, abzz, bczv, t3, fxpa, fxpc, fxpg, fxpt, fxpr, fxpy, fxqr, fxqy, w; double *rptr; nodeptr q; int i, *wptr; q = p->back; while ((! p->x) || (! q->x)) { if (! (p->x)) if (! newview(tr, p)) return -1.0; if (! (q->x)) if (! newview(tr, q)) return -1.0; } lp = p->x->lv; lq = q->x->lv; z = p->z; if (z < zmin) z = zmin; lz = log(z); wptr = &(tr->cdta->aliaswgt[0]); rptr = &(tr->cdta->patrat[0]); sum = sumlr = slope = 0.0; # ifdef Vectorize # pragma IVDEP # endif for (i = 0; i < tr->cdta->endsite; i++) { rat = *rptr++; zz = exp(rat * lz); zv = exp(rat * tr->rdta->xv * lz); fxpa = tr->rdta->freqa * lp->a; fxpg = tr->rdta->freqg * lp->g; fxpc = tr->rdta->freqc * lp->c; fxpt = tr->rdta->freqt * lp->t; fxpr = fxpa + fxpg; fxpy = fxpc + fxpt; suma = fxpa * lq->a + fxpc * lq->c + fxpg * lq->g + fxpt * lq->t; fxqr = tr->rdta->freqa * lq->a + tr->rdta->freqg * lq->g; fxqy = tr->rdta->freqc * lq->c + tr->rdta->freqt * lq->t; sumc = (fxpr + fxpy) * (fxqr + fxqy); sumb = fxpr * fxqr * tr->rdta->invfreqr + fxpy * fxqy * tr->rdta->invfreqy; abzz = zz * (suma - sumb); bczv = zv * (sumb - sumc); li = sumc + abzz + bczv; t3 = tr->rdta->xv * bczv; d = abzz + t3; d2 = rat * (abzz*(rat-1.0) + t3*(rat * tr->rdta->xv - 1.0)); temp = rat * d / li; w = *wptr++; slope += w * temp; sum += w * (temp * temp - d2/li); sumlr += w * log(li / (suma + 1.0E-300)); lp++; lq++; } *sumlrptr = sumlr; return (sum > 1.0E-300) ? z*(-slope + sqrt(slope*slope + 3.841*sum))/sum : 1.0; } /* sigma */ void describe (tree *tr, nodeptr p) /* print out information for one branch */ { /* describe */ double z, s, sumlr; nodeptr q; char *nameptr; int k, ch; q = p->back; printf("%4d ", q->number - tr->mxtips); if (p->tip) { nameptr = p->name; k = nmlngth; while (ch = *nameptr++) {putchar(ch); k--;} while (--k >= 0) putchar(' '); } else { printf("%4d", p->number - tr->mxtips); for (k = 4; k < nmlngth; k++) putchar(' '); } z = q->z; if (z <= zmin) printf(" infinity"); else printf("%15.5f", -log(z) * tr->rdta->fracchange); s = sigma(tr, q, & sumlr); printf(" ("); if (z + s >= zmax) printf(" zero"); else printf("%9.5f", (double) -log(z + s) * tr->rdta->fracchange); putchar(','); if (z - s <= zmin) printf(" infinity"); else printf("%12.5f", (double) -log(z - s) * tr->rdta->fracchange); putchar(')'); if (sumlr > 2.995 ) printf(" **"); else if (sumlr > 1.9205) printf(" *"); putchar('\n'); if (! p->tip) { describe(tr, p->next->back); describe(tr, p->next->next->back); } } /* describe */ void summarize (tree *tr) /* print out branch length information and node numbers */ { /* summarize */ printf("Ln Likelihood =%14.5f\n", tr->likelihood); putchar('\n'); printf(" Between And Length"); printf(" Approx. Confidence Limits\n"); printf(" ------- --- ------"); printf(" ------- ---------- ------\n"); describe(tr, tr->start->back->next->back); describe(tr, tr->start->back->next->next->back); describe(tr, tr->start); putchar('\n'); printf(" * = significantly positive, P < 0.05\n"); printf(" ** = significantly positive, P < 0.01\n\n\n"); } /* summarize */ /*=========== This is a problem if tr->start->back is a tip! ===========*/ /* All routines should be contrived so that tr->start->back is not a tip */ char *treeString (char *treestr, tree *tr, nodeptr p, int form) /* write string with representation of tree */ /* form == 1 -> Newick tree */ /* form == 2 -> Prolog fact */ /* form == 3 -> PHYLIP tree */ { /* treeString */ double x, z; char *nameptr; int c; if (p == tr->start->back) { if (form != treePHYLIP) { if (form == treeProlog) { (void) sprintf(treestr, "phylip_tree("); while (*treestr) treestr++; /* move pointer to null */ } (void) sprintf(treestr, "[&&%s: version = '%s'", programName, programVersion); while (*treestr) treestr++; (void) sprintf(treestr, ", %s = %15.13g", likelihood_key, tr->likelihood); while (*treestr) treestr++; (void) sprintf(treestr, ", %s = %d", ntaxa_key, tr->ntips); while (*treestr) treestr++; (void) sprintf(treestr,", %s = %d", opt_level_key, tr->opt_level); while (*treestr) treestr++; (void) sprintf(treestr, ", %s = %d", smoothed_key, tr->smoothed); while (*treestr) treestr++; (void) sprintf(treestr, "]%s", form == treeProlog ? ", " : " "); while (*treestr) treestr++; } } if (p->tip) { if (form != treePHYLIP) *treestr++ = '\''; nameptr = p->name; while (c = *nameptr++) { if (form != treePHYLIP) {if (c == '\'') *treestr++ = '\'';} else if (c == ' ') {c = '_';} *treestr++ = c; } if (form != treePHYLIP) *treestr++ = '\''; } else { *treestr++ = '('; treestr = treeString(treestr, tr, p->next->back, form); *treestr++ = ','; treestr = treeString(treestr, tr, p->next->next->back, form); if (p == tr->start->back) { *treestr++ = ','; treestr = treeString(treestr, tr, p->back, form); } *treestr++ = ')'; } if (p == tr->start->back) { (void) sprintf(treestr, ":0.0%s\n", (form != treeProlog) ? ";" : ")."); } else { z = p->z; if (z < zmin) z = zmin; x = -log(z) * tr->rdta->fracchange; (void) sprintf(treestr, ": %8.6f", x); /* prolog needs the space */ } while (*treestr) treestr++; /* move pointer up to null termination */ return treestr; } /* treeString */ void treeOut (FILE *treefile, tree *tr, int form) /* write out file with representation of final tree */ { /* treeOut */ int c; char *cptr, *treestr; treestr = (char *) Malloc((tr->ntips * (nmlngth+32)) + 256); if (! treestr) { fprintf(stderr, "treeOut: Malloc failure\n"); exit(1); } (void) treeString(treestr, tr, tr->start->back, form); cptr = treestr; while (c = *cptr++) putc(c, treefile); Free(treestr); } /* treeOut */ /*=======================================================================*/ /* Read a tree from a file */ /*=======================================================================*/ /* 1.0.A Processing of quotation marks in comment removed */ int treeFinishCom (FILE *fp, char **strp) { /* treeFinishCom */ int ch; while ((ch = getc(fp)) != EOF && ch != ']') { if (strp != NULL) *(*strp)++ = ch; /* save character */ if (ch == '[') { /* nested comment; find its end */ if ((ch = treeFinishCom(fp, strp)) == EOF) break; if (strp != NULL) *(*strp)++ = ch; /* save closing ] */ } } if (strp != NULL) **strp = '\0'; /* terminate string */ return ch; } /* treeFinishCom */ int treeGetCh (FILE *fp) /* get next nonblank, noncomment character */ { /* treeGetCh */ int ch; while ((ch = getc(fp)) != EOF) { if (whitechar(ch)) ; else if (ch == '[') { /* comment; find its end */ if ((ch = treeFinishCom(fp, (char **) NULL)) == EOF) break; } else break; } return ch; } /* treeGetCh */ boolean treeLabelEnd (int ch) { /* treeLabelEnd */ switch (ch) { case EOF: case '\0': case '\t': case '\n': case ' ': case ':': case ',': case '(': case ')': case '[': case ';': return TRUE; default: break; } return FALSE; } /* treeLabelEnd */ boolean treeGetLabel (FILE *fp, char *lblPtr, int maxlen) { /* treeGetLabel */ int ch; boolean done, quoted, lblfound; if (--maxlen < 0) lblPtr = (char *) NULL; /* reserves space for '\0' */ else if (lblPtr == NULL) maxlen = 0; ch = getc(fp); done = treeLabelEnd(ch); lblfound = ! done; quoted = (ch == '\''); if (quoted && ! done) {ch = getc(fp); done = (ch == EOF);} while (! done) { if (quoted) { if (ch == '\'') {ch = getc(fp); if (ch != '\'') break;} } else if (treeLabelEnd(ch)) break; else if (ch == '_') ch = ' '; /* unquoted _ goes to space */ if (--maxlen >= 0) *lblPtr++ = ch; ch = getc(fp); if (ch == EOF) break; } if (ch != EOF) (void) ungetc(ch, fp); if (lblPtr != NULL) *lblPtr = '\0'; return lblfound; } /* treeGetLabel */ boolean treeFlushLabel (FILE *fp) { /* treeFlushLabel */ return treeGetLabel(fp, (char *) NULL, (int) 0); } /* treeFlushLabel */ int treeFindTipByLabel (char *str, tree *tr) /* str -- label string pointer */ { /* treeFindTipByLabel */ nodeptr q; char *nameptr; int ch, i, n; boolean found; for (n = 1; n <= tr->mxtips; n++) { q = tr->nodep[n]; if (! (q->back)) { /* Only consider unused tips */ i = 0; nameptr = q->name; while ((found = (str[i++] == (ch = *nameptr++))) && ch) ; if (found) return n; } } printf("ERROR: Cannot find tree species: %s\n", str); return 0; } /* treeFindTipByLabel */ int treeFindTipName (FILE *fp, tree *tr) { /* treeFindTipName */ char *nameptr, str[nmlngth+2]; int n; if (tr->prelabeled) { if (treeGetLabel(fp, str, nmlngth+2)) n = treeFindTipByLabel(str, tr); else n = 0; } else if (tr->ntips < tr->mxtips) { n = tr->ntips + 1; nameptr = tr->nodep[n]->name; if (! treeGetLabel(fp, nameptr, nmlngth+1)) n = 0; } else { n = 0; } return n; } /* treeFindTipName */ void treeEchoContext (FILE *fp1, FILE *fp2, int n) { /* treeEchoContext */ int ch; boolean waswhite; waswhite = TRUE; while (n > 0 && ((ch = getc(fp1)) != EOF)) { if (whitechar(ch)) { ch = waswhite ? '\0' : ' '; waswhite = TRUE; } else { waswhite = FALSE; } if (ch > '\0') {putc(ch, fp2); n--;} } } /* treeEchoContext */ boolean treeProcessLength (FILE *fp, double *dptr) { /* treeProcessLength */ int ch; if ((ch = treeGetCh(fp)) == EOF) return FALSE; /* Skip comments */ (void) ungetc(ch, fp); if (fscanf(fp, "%lf", dptr) != 1) { printf("ERROR: treeProcessLength: Problem reading branch length\n"); treeEchoContext(fp, stdout, 40); printf("\n"); return FALSE; } return TRUE; } /* treeProcessLength */ boolean treeFlushLen (FILE *fp) { /* treeFlushLen */ double dummy; int ch; if ((ch = treeGetCh(fp)) == ':') return treeProcessLength(fp, & dummy); if (ch != EOF) (void) ungetc(ch, fp); return TRUE; } /* treeFlushLen */ boolean treeNeedCh (FILE *fp, int c1, char *where) { /* treeNeedCh */ int c2; if ((c2 = treeGetCh(fp)) == c1) return TRUE; printf("ERROR: Expecting '%c' %s tree; found:", c1, where); if (c2 == EOF) { printf("End-of-File"); } else { ungetc(c2, fp); treeEchoContext(fp, stdout, 40); } putchar('\n'); return FALSE; } /* treeNeedCh */ boolean addElementLen (FILE *fp, tree *tr, nodeptr p) { /* addElementLen */ double z, branch; nodeptr q; int n, ch; if ((ch = treeGetCh(fp)) == '(') { /* A new internal node */ n = (tr->nextnode)++; if (n > 2*(tr->mxtips) - 2) { if (tr->rooted || n > 2*(tr->mxtips) - 1) { printf("ERROR: Too many internal nodes. Is tree rooted?\n"); printf(" Deepest splitting should be a trifurcation.\n"); return FALSE; } else { tr->rooted = TRUE; } } q = tr->nodep[n]; if (! addElementLen(fp, tr, q->next)) return FALSE; if (! treeNeedCh(fp, ',', "in")) return FALSE; if (! addElementLen(fp, tr, q->next->next)) return FALSE; if (! treeNeedCh(fp, ')', "in")) return FALSE; (void) treeFlushLabel(fp); } else { /* A new tip */ ungetc(ch, fp); if ((n = treeFindTipName(fp, tr)) <= 0) return FALSE; q = tr->nodep[n]; if (tr->start->number > n) tr->start = q; (tr->ntips)++; } /* End of tip processing */ if (tr->userlen) { if (! treeNeedCh(fp, ':', "in")) return FALSE; if (! treeProcessLength(fp, & branch)) return FALSE; z = exp(-branch / tr->rdta->fracchange); if (z > zmax) z = zmax; hookup(p, q, z); } else { if (! treeFlushLen(fp)) return FALSE; hookup(p, q, defaultz); } return TRUE; } /* addElementLen */ int saveTreeCom (char **comstrp) { /* saveTreeCom */ int ch; boolean inquote; inquote = FALSE; while ((ch = getc(INFILE)) != EOF && (inquote || ch != ']')) { *(*comstrp)++ = ch; /* save character */ if (ch == '[' && ! inquote) { /* comment; find its end */ if ((ch = saveTreeCom(comstrp)) == EOF) break; *(*comstrp)++ = ch; /* add ] */ } else if (ch == '\'') inquote = ! inquote; /* start or end of quote */ } return ch; } /* saveTreeCom */ boolean processTreeCom (FILE *fp, tree *tr) { /* processTreeCom */ int text_started, functor_read, com_open; /* Accept prefatory "phylip_tree(" or "pseudoNewick(" */ functor_read = text_started = 0; (void) fscanf(fp, " p%nhylip_tree(%n", & text_started, & functor_read); if (text_started && ! functor_read) { (void) fscanf(fp, "seudoNewick(%n", & functor_read); if (! functor_read) { printf("Start of tree 'p...' not understood.\n"); return FALSE; } } com_open = 0; (void) fscanf(fp, " [%n", & com_open); if (com_open) { /* comment; read it */ char com[1024], *com_end; com_end = com; if (treeFinishCom(fp, & com_end) == EOF) { /* omits enclosing []s */ printf("Missing end of tree comment\n"); return FALSE; } (void) readKeyValue(com, likelihood_key, "%lg", (void *) &(tr->likelihood)); (void) readKeyValue(com, opt_level_key, "%d", (void *) &(tr->opt_level)); (void) readKeyValue(com, smoothed_key, "%d", (void *) &(tr->smoothed)); if (functor_read) (void) fscanf(fp, " ,"); /* remove trailing comma */ } return (functor_read > 0); } /* processTreeCom */ nodeptr uprootTree (tree *tr, nodeptr p) { /* uprootTree */ nodeptr q, r, s, start; int n; if (p->tip || p->back) { printf("ERROR: Unable to uproot tree.\n"); printf(" Inappropriate node marked for removal.\n"); return (nodeptr) NULL; } n = --(tr->nextnode); /* last internal node added */ if (n != tr->mxtips + tr->ntips - 1) { printf("ERROR: Unable to uproot tree. Inconsistent\n"); printf(" number of tips and nodes for rooted tree.\n"); return (nodeptr) NULL; } q = p->next->back; /* remove p from tree */ r = p->next->next->back; hookup(q, r, tr->userlen ? (q->z * r->z) : defaultz); start = (r->tip || (! q->tip)) ? r : r->next->next->back; if (tr->ntips > 2 && p->number != n) { q = tr->nodep[n]; /* transfer last node's conections to p */ r = q->next; s = q->next->next; hookup(p, q->back, q->z); /* move connections to p */ hookup(p->next, r->back, r->z); hookup(p->next->next, s->back, s->z); if (start->number == q->number) start = start->back->back; q->back = r->back = s->back = (nodeptr) NULL; } else { p->back = p->next->back = p->next->next->back = (nodeptr) NULL; } tr->rooted = FALSE; return start; } /* uprootTree */ boolean treeReadLen (FILE *fp, tree *tr) { /* treeReadLen */ nodeptr p; int i, ch; boolean is_fact; for (i = 1; i <= tr->mxtips; i++) tr->nodep[i]->back = (node *) NULL; tr->start = tr->nodep[tr->mxtips]; tr->ntips = 0; tr->nextnode = tr->mxtips + 1; tr->opt_level = 0; tr->log_f_valid = 0; tr->smoothed = FALSE; tr->rooted = FALSE; is_fact = processTreeCom(fp, tr); p = tr->nodep[(tr->nextnode)++]; if (! treeNeedCh(fp, '(', "at start of")) return FALSE; if (! addElementLen(fp, tr, p)) return FALSE; if (! treeNeedCh(fp, ',', "in")) return FALSE; if (! addElementLen(fp, tr, p->next)) return FALSE; if (! tr->rooted) { if ((ch = treeGetCh(fp)) == ',') { /* An unrooted format */ if (! addElementLen(fp, tr, p->next->next)) return FALSE; } else { /* A rooted format */ tr->rooted = TRUE; if (ch != EOF) (void) ungetc(ch, fp); } } else { p->next->next->back = (nodeptr) NULL; } if (! treeNeedCh(fp, ')', "in")) return FALSE; (void) treeFlushLabel(fp); if (! treeFlushLen(fp)) return FALSE; if (is_fact) { if (! treeNeedCh(fp, ')', "at end of")) return FALSE; if (! treeNeedCh(fp, '.', "at end of")) return FALSE; } else { if (! treeNeedCh(fp, ';', "at end of")) return FALSE; } if (tr->rooted) { p->next->next->back = (nodeptr) NULL; tr->start = uprootTree(tr, p->next->next); if (! tr->start) return FALSE; } else { tr->start = p->next->next->back; /* This is start used by treeString */ } return (initrav(tr, tr->start) && initrav(tr, tr->start->back)); } /* treeReadLen */ /*=======================================================================*/ /* Read a tree from a string */ /*=======================================================================*/ #if Master || Slave int str_treeFinishCom (char **treestrp, char **strp) /* treestrp -- tree string pointer */ /* strp -- comment string pointer */ { /* str_treeFinishCom */ int ch; while ((ch = *(*treestrp)++) != NULL && ch != ']') { if (strp != NULL) *(*strp)++ = ch; /* save character */ if (ch == '[') { /* nested comment; find its end */ if ((ch = str_treeFinishCom(treestrp)) == NULL) break; if (strp != NULL) *(*strp)++ = ch; /* save closing ] */ } } if (strp != NULL) **strp = '\0'; /* terminate string */ return ch; } /* str_treeFinishCom */ int str_treeGetCh (char **treestrp) /* get next nonblank, noncomment character */ { /* str_treeGetCh */ int ch; while ((ch = *(*treestrp)++) != NULL) { if (whitechar(ch)) ; else if (ch == '[') { /* comment; find its end */ if ((ch = str_treeFinishCom(treestrp, (char *) NULL)) == NULL) break; } else break; } return ch; } /* str_treeGetCh */ boolean str_treeGetLabel (char **treestrp, char *lblPtr, int maxlen) { /* str_treeGetLabel */ int ch; boolean done, quoted, lblfound; if (--maxlen < 0) lblPtr = (char *) NULL; /* reserves space for '\0' */ else if (lblPtr == NULL) maxlen = 0; ch = *(*treestrp)++; done = treeLabelEnd(ch); lblfound = ! done; quoted = (ch == '\''); if (quoted && ! done) {ch = *(*treestrp)++; done = (ch == '\0');} while (! done) { if (quoted) { if (ch == '\'') {ch = *(*treestrp)++; if (ch != '\'') break;} } else if (treeLabelEnd(ch)) break; else if (ch == '_') ch = ' '; /* unquoted _ goes to space */ if (--maxlen >= 0) *lblPtr++ = ch; ch = *(*treestrp)++; if (ch == '\0') break; } (*treestrp)--; if (lblPtr != NULL) *lblPtr = '\0'; return lblfound; } /* str_treeGetLabel */ boolean str_treeFlushLabel (char **treestrp) { /* str_treeFlushLabel */ return str_treeGetLabel(treestrp, (char *) NULL, (int) 0); } /* str_treeFlushLabel */ int str_treeFindTipName (char **treestrp, tree *tr) { /* str_treeFindTipName */ nodeptr q; char *nameptr, str[nmlngth+2]; int ch, i, n; if (tr->prelabeled) { if (str_treeGetLabel(treestrp, str, nmlngth+2)) n = treeFindTipByLabel(str, tr); else n = 0; } else if (tr->ntips < tr->mxtips) { n = tr->ntips + 1; nameptr = tr->nodep[n]->name; if (! str_treeGetLabel(treestrp, nameptr, nmlngth+1)) n = 0; } else { n = 0; } return n; } /* str_treeFindTipName */ boolean str_treeProcessLength (char **treestrp, double *dptr) { /* str_treeProcessLength */ int used; if(! str_treeGetCh(treestrp)) return FALSE; /* Skip comments */ (*treestrp)--; if (sscanf(*treestrp, "%lf%n", dptr, & used) != 1) { printf("ERROR: str_treeProcessLength: Problem reading branch length\n"); printf("%40s\n", *treestrp); *dptr = 0.0; return FALSE; } else { *treestrp += used; } return TRUE; } /* str_treeProcessLength */ boolean str_treeFlushLen (char **treestrp) { /* str_treeFlushLen */ int ch; if ((ch = str_treeGetCh(treestrp)) == ':') return str_treeProcessLength(treestrp, (double *) NULL); else { (*treestrp)--; return TRUE; } } /* str_treeFlushLen */ boolean str_treeNeedCh (char **treestrp, int c1, char *where) { /* str_treeNeedCh */ int c2, i; if ((c2 = str_treeGetCh(treestrp)) == c1) return TRUE; printf("ERROR: Missing '%c' %s tree; ", c1, where); if (c2 == '\0') printf("end-of-string"); else { putchar('"'); for (i = 24; i-- && (c2 != '\0'); c2 = *(*treestrp)++) putchar(c2); putchar('"'); } printf(" found instead\n"); return FALSE; } /* str_treeNeedCh */ boolean str_addElementLen (char **treestrp, tree *tr, nodeptr p) { /* str_addElementLen */ double z, branch; nodeptr q; int n, ch; if ((ch = str_treeGetCh(treestrp)) == '(') { /* A new internal node */ n = (tr->nextnode)++; if (n > 2*(tr->mxtips) - 2) { if (tr->rooted || n > 2*(tr->mxtips) - 1) { printf("ERROR: too many internal nodes. Is tree rooted?\n"); printf("Deepest splitting should be a trifurcation.\n"); return FALSE; } else { tr->rooted = TRUE; } } q = tr->nodep[n]; if (! str_addElementLen(treestrp, tr, q->next)) return FALSE; if (! str_treeNeedCh(treestrp, ',', "in")) return FALSE; if (! str_addElementLen(treestrp, tr, q->next->next)) return FALSE; if (! str_treeNeedCh(treestrp, ')', "in")) return FALSE; if (! str_treeFlushLabel(treestrp)) return FALSE; } else { /* A new tip */ n = str_treeFindTipName(treestrp, tr, ch); if (n <= 0) return FALSE; q = tr->nodep[n]; if (tr->start->number > n) tr->start = q; (tr->ntips)++; } /* End of tip processing */ /* Master and Slave always use lengths */ if (! str_treeNeedCh(treestrp, ':', "in")) return FALSE; if (! str_treeProcessLength(treestrp, & branch)) return FALSE; z = exp(-branch / tr->rdta->fracchange); if (z > zmax) z = zmax; hookup(p, q, z); return TRUE; } /* str_addElementLen */ boolean str_processTreeCom(tree *tr, char **treestrp) { /* str_processTreeCom */ char *com, *com_end; int text_started, functor_read, com_open; com = *treestrp; functor_read = text_started = 0; sscanf(com, " p%nhylip_tree(%n", & text_started, & functor_read); if (functor_read) { com += functor_read; } else if (text_started) { com += text_started; sscanf(com, "seudoNewick(%n", & functor_read); if (! functor_read) { printf("Start of tree 'p...' not understood.\n"); return FALSE; } else { com += functor_read; } } com_open = 0; sscanf(com, " [%n", & com_open); com += com_open; if (com_open) { /* comment; read it */ if (!(com_end = strchr(com, ']'))) { printf("Missing end of tree comment.\n"); return FALSE; } *com_end = 0; (void) readKeyValue(com, likelihood_key, "%lg", (void *) &(tr->likelihood)); (void) readKeyValue(com, opt_level_key, "%d", (void *) &(tr->opt_level)); (void) readKeyValue(com, smoothed_key, "%d", (void *) &(tr->smoothed)); *com_end = ']'; com_end++; if (functor_read) { /* remove trailing comma */ text_started = 0; sscanf(com_end, " ,%n", & text_started); com_end += text_started; } *treestrp = com_end; } return (functor_read > 0); } /* str_processTreeCom */ boolean str_treeReadLen (char *treestr, tree *tr) /* read string with representation of tree */ { /* str_treeReadLen */ nodeptr p; int i; boolean is_fact, found; for (i = 1; i <= tr->mxtips; i++) tr->nodep[i]->back = (node *) NULL; tr->start = tr->nodep[tr->mxtips]; tr->ntips = 0; tr->nextnode = tr->mxtips + 1; tr->opt_level = 0; tr->log_f_valid = 0; tr->smoothed = Master; tr->rooted = FALSE; is_fact = str_processTreeCom(tr, & treestr); p = tr->nodep[(tr->nextnode)++]; if (! str_treeNeedCh(& treestr, '(', "at start of")) return FALSE; if (! str_addElementLen(& treestr, tr, p)) return FALSE; if (! str_treeNeedCh(& treestr, ',', "in")) return FALSE; if (! str_addElementLen(& treestr, tr, p->next)) return FALSE; if (! tr->rooted) { if (str_treeGetCh(& treestr) == ',') { /* An unrooted format */ if (! str_addElementLen(& treestr, tr, p->next->next)) return FALSE; } else { /* A rooted format */ p->next->next->back = (nodeptr) NULL; tr->rooted = TRUE; treestr--; } } if (! str_treeNeedCh(& treestr, ')', "in")) return FALSE; if (! str_treeFlushLabel(& treestr)) return FALSE; if (! str_treeFlushLen(& treestr)) return FALSE; if (is_fact) { if (! str_treeNeedCh(& treestr, ')', "at end of")) return FALSE; if (! str_treeNeedCh(& treestr, '.', "at end of")) return FALSE; } else { if (! str_treeNeedCh(& treestr, ';', "at end of")) return FALSE; } if (tr->rooted) if (! uprootTree(tr, p->next->next)) return FALSE; tr->start = p->next->next->back; /* This is start used by treeString */ return (initrav(tr, tr->start) && initrav(tr, tr->start->back)); } /* str_treeReadLen */ #endif boolean treeEvaluate (tree *tr, bestlist *bt) /* Evaluate a user tree */ { /* treeEvaluate */ if (Slave || ! tr->userlen) { if (! smoothTree(tr, 4 * smoothings)) return FALSE; } if (evaluate(tr, tr->start) == badEval) return FALSE; # if ! Slave (void) saveBestTree(bt, tr); # endif return TRUE; } /* treeEvaluate */ #if Master || Slave FILE *freopen_pid (char *filenm, char *mode, FILE *stream) { /* freopen_pid */ char scr[512]; (void) sprintf(scr, "%s.%d", filenm, getpid()); return freopen(scr, mode, stream); } /* freopen_pid */ #endif boolean showBestTrees (bestlist *bt, tree *tr, analdef *adef, FILE *treefile) { /* showBestTrees */ int rank; for (rank = 1; rank <= bt->nvalid; rank++) { if (rank > 1) { if (rank != recallBestTree(bt, rank, tr)) break; } if (evaluate(tr, tr->start) == badEval) return FALSE; if (tr->outgrnode->back) tr->start = tr->outgrnode; printTree(tr, adef); summarize(tr); if (treefile) treeOut(treefile, tr, adef->trout); } return TRUE; } /* showBestTrees */ boolean cmpBestTrees (bestlist *bt, tree *tr) { /* cmpBestTrees */ double sum, sum2, sd, temp, wtemp, bestscore; double *log_f0, *log_f0_ptr; /* Save a copy of best log_f */ double *log_f_ptr; int i, j, num, besttips; num = bt->nvalid; if ((num <= 1) || (tr->cdta->wgtsum <= 1)) return TRUE; if (! (log_f0 = (double *) Malloc(sizeof(double) * tr->cdta->endsite))) { printf("ERROR: cmpBestTrees unable to obtain space for log_f0\n"); return FALSE; } printf("Tree Ln L Diff Ln L Its S.D."); printf(" Significantly worse?\n\n"); for (i = 1; i <= num; i++) { if (i != recallBestTree(bt, i, tr)) break; if (! (tr->log_f_valid)) { if (evaluate(tr, tr->start) == badEval) return FALSE; } printf("%3d%14.5f", i, tr->likelihood); if (i == 1) { printf(" <------ best\n"); besttips = tr->ntips; bestscore = tr->likelihood; log_f0_ptr = log_f0; log_f_ptr = tr->log_f; for (j = 0; j < tr->cdta->endsite; j++) *log_f0_ptr++ = *log_f_ptr++; } else if (tr->ntips != besttips) printf(" (different number of species)\n"); else { sum = sum2 = 0.0; log_f0_ptr = log_f0; log_f_ptr = tr->log_f; for (j = 0; j < tr->cdta->endsite; j++) { temp = *log_f0_ptr++ - *log_f_ptr++; wtemp = tr->cdta->aliaswgt[j] * temp; sum += wtemp; sum2 += wtemp * temp; } sd = sqrt( tr->cdta->wgtsum * (sum2 - sum*sum / tr->cdta->wgtsum) / (tr->cdta->wgtsum - 1) ); printf("%14.5f%14.4f", tr->likelihood - bestscore, sd); printf(" %s\n", (sum > 1.95996 * sd) ? "Yes" : " No"); } } Free(log_f0); printf("\n\n"); return TRUE; } /* cmpBestTrees */ boolean makeUserTree (tree *tr, bestlist *bt, analdef *adef) { /* makeUserTree */ char filename[128]; FILE *treefile; int nusertrees, which; nusertrees = adef->numutrees; printf("User-defined %s:\n\n", (nusertrees == 1) ? "tree" : "trees"); treefile = adef->trout ? fopen_pid("treefile", "w", filename) : (FILE *) NULL; for (which = 1; which <= nusertrees; which++) { if (! treeReadLen(INFILE, tr)) return FALSE; if (! treeEvaluate(tr, bt)) return FALSE; if (tr->global <= 0) { if (tr->outgrnode->back) tr->start = tr->outgrnode; printTree(tr, adef); summarize(tr); if (treefile) treeOut(treefile, tr, adef->trout); } else { printf("%6d: Ln Likelihood =%14.5f\n", which, tr->likelihood); } } if (tr->global > 0) { putchar('\n'); if (! recallBestTree(bt, 1, tr)) return FALSE; printf(" Ln Likelihood =%14.5f\n", tr->likelihood); if (! optimize(tr, tr->global, bt)) return FALSE; if (tr->outgrnode->back) tr->start = tr->outgrnode; printTree(tr, adef); summarize(tr); if (treefile) treeOut(treefile, tr, adef->trout); } if (treefile) { (void) fclose(treefile); printf("Tree also written to %s\n", filename); } putchar('\n'); (void) cmpBestTrees(bt, tr); return TRUE; } /* makeUserTree */ #if Slave boolean slaveTreeEvaluate (tree *tr, bestlist *bt) { /* slaveTreeEvaluate */ boolean done; do { requestForWork(); if (! receiveTree(& comm_master, tr)) return FALSE; done = tr->likelihood > 0.0; if (! done) { if (! treeEvaluate(tr, bt)) return FALSE; if (! sendTree(& comm_master, tr)) return FALSE; } } while (! done); return TRUE; } /* slaveTreeEvaluate */ #endif double randum (long *seed) /* random number generator, modified to use 12 bit chunks */ { /* randum */ long sum, mult0, mult1, seed0, seed1, seed2, newseed0, newseed1, newseed2; mult0 = 1549; seed0 = *seed & 4095; sum = mult0 * seed0; newseed0 = sum & 4095; sum >>= 12; seed1 = (*seed >> 12) & 4095; mult1 = 406; sum += mult0 * seed1 + mult1 * seed0; newseed1 = sum & 4095; sum >>= 12; seed2 = (*seed >> 24) & 255; sum += mult0 * seed2 + mult1 * seed1; newseed2 = sum & 255; *seed = newseed2 << 24 | newseed1 << 12 | newseed0; return 0.00390625 * (newseed2 + 0.000244140625 * (newseed1 + 0.000244140625 * newseed0)); } /* randum */ boolean makeDenovoTree (tree *tr, bestlist *bt, analdef *adef) { /* makeDenovoTree */ char filename[128]; FILE *treefile; nodeptr p; int *enterorder; /* random entry order */ int i, j, k, nextsp, newsp, maxtrav, tested; double randum(); enterorder = (int *) Malloc(sizeof(int) * (tr->mxtips + 1)); if (! enterorder) { fprintf(stderr, "makeDenovoTree: Malloc failure for enterorder\n"); return 0; } if (adef->restart) { printf("Restarting from tree with the following sequences:\n"); tr->userlen = TRUE; if (! treeReadLen(INFILE, tr)) return FALSE; if (! smoothTree(tr, smoothings)) return FALSE; if (evaluate(tr, tr->start) == badEval) return FALSE; if (saveBestTree(bt, tr) < 1) return FALSE; for (i = 1, j = tr->ntips; i <= tr->mxtips; i++) { /* find loose tips */ if (! tr->nodep[i]->back) { enterorder[++j] = i; } else { printf(" %s\n", tr->nodep[i]->name); # if Master if (i>3) REPORT_ADD_SPECS; # endif } } putchar('\n'); } else { /* start from scratch */ tr->ntips = 0; for (i = 1; i <= tr->mxtips; i++) enterorder[i] = i; } if (adef->jumble) for (i = tr->ntips + 1; i <= tr->mxtips; i++) { j = randum(&(adef->jumble))*(tr->mxtips - tr->ntips) + tr->ntips + 1; k = enterorder[j]; enterorder[j] = enterorder[i]; enterorder[i] = k; } bt->numtrees = 1; if (tr->ntips < tr->mxtips) printf("Adding species:\n"); if (tr->ntips == 0) { for (i = 1; i <= 3; i++) { printf(" %s\n", tr->nodep[enterorder[i]]->name); } tr->nextnode = tr->mxtips + 1; if (! buildSimpleTree(tr, enterorder[1], enterorder[2], enterorder[3])) return FALSE; } while (tr->ntips < tr->mxtips || tr->opt_level < tr->global) { maxtrav = (tr->ntips == tr->mxtips) ? tr->global : tr->partswap; if (maxtrav > tr->ntips - 3) maxtrav = tr->ntips - 3; if (tr->opt_level >= maxtrav) { nextsp = ++(tr->ntips); newsp = enterorder[nextsp]; p = tr->nodep[newsp]; printf(" %s\n", p->name); # if Master if (nextsp % DNAML_STEP_TIME_COUNT == 1) { REPORT_STEP_TIME; } REPORT_ADD_SPECS; # endif (void) buildNewTip(tr, p); resetBestTree(bt); cacheZ(tr); tested = addTraverse(tr, p->back, findAnyTip(tr->start)->back, 1, tr->ntips - 2, bt, adef->qadd); if (tested == badRear) return FALSE; bt->numtrees += tested; # if Master getReturnedTrees(tr, bt, tested); # endif printf(" Tested %d alternative trees\n", tested); (void) recallBestTree(bt, 1, tr); if (! tr->smoothed) { if (! smoothTree(tr, smoothings)) return FALSE; if (evaluate(tr, tr->start) == badEval) return FALSE; (void) saveBestTree(bt, tr); } if (tr->ntips == 4) tr->opt_level = 1; /* All 4 taxon trees done */ maxtrav = (tr->ntips == tr->mxtips) ? tr->global : tr->partswap; if (maxtrav > tr->ntips - 3) maxtrav = tr->ntips - 3; } printf(" Ln Likelihood =%14.5f\n", tr->likelihood); if (! optimize(tr, maxtrav, bt)) return FALSE; } printf("\nExamined %d %s\n", bt->numtrees, bt->numtrees != 1 ? "trees" : "tree"); treefile = adef->trout ? fopen_pid("treefile", "w", filename) : (FILE *) NULL; (void) showBestTrees(bt, tr, adef, treefile); if (treefile) { (void) fclose(treefile); printf("Tree also written to %s\n\n", filename); } (void) cmpBestTrees(bt, tr); # if DeleteCheckpointFile unlink_pid(checkpointname); # endif Free(enterorder); return TRUE; } /* makeDenovoTree */ /*==========================================================================*/ /* "main" routine */ /*==========================================================================*/ #if Sequential main () #else slave () #endif { /* DNA Maximum Likelihood */ # if Master int starttime, inputtime, endtime; # endif # if Master || Slave int my_id, nprocs, type, from, sz; char *msg; # endif analdef *adef; rawdata *rdta; cruncheddata *cdta; tree *tr; /* current tree */ bestlist *bt; /* topology of best found tree */ # if Debug { char debugfilename[128]; debug = fopen_pid("dnaml_debug", "w", debugfilename); } # endif # if Master starttime = p4_clock(); nprocs = p4_num_total_slaves(); if ((OUTFILE = freopen_pid("master.out", "w", stdout)) == NULL) { fprintf(stderr, "Could not open output file\n"); exit(1); } /* Receive input file name from host */ type = DNAML_FILE_NAME; from = DNAML_HOST_ID; msg = NULL; p4_recv(& type, & from, & msg, & sz); if ((INFILE = fopen(msg, "r")) == NULL) { fprintf(stderr, "master could not open input file %s\n", msg); exit(1); } p4_msg_free(msg); open_link(& comm_slave); # endif # if DNAML_STEP begin_step_time = starttime; # endif # if Slave my_id = p4_get_my_id(); nprocs = p4_num_total_slaves(); /* Receive input file name from host */ type = DNAML_FILE_NAME; from = DNAML_HOST_ID; msg = NULL; p4_recv(& type, & from, & msg, & sz); if ((INFILE = fopen(msg, "r")) == NULL) { fprintf(stderr, "slave could not open input file %s\n",msg); exit(1); } p4_msg_free(msg); # ifdef P4DEBUG if ((OUTFILE = freopen_pid("slave.out", "w", stdout)) == NULL) { fprintf(stderr, "Could not open output file\n"); exit(1); } # else if ((OUTFILE = freopen("/dev/null", "w", stdout)) == NULL) { fprintf(stderr, "Could not open output file\n"); exit(1); } # endif open_link(& comm_master); # endif /* Get data structure memory */ if (! (adef = (analdef *) Malloc(sizeof(analdef)))) { printf("ERROR: Unable to get memory for analysis definition\n\n"); return 1; } if (! (rdta = (rawdata *) Malloc(sizeof(rawdata)))) { printf("ERROR: Unable to get memory for raw DNA\n\n"); return 1; } if (! (cdta = (cruncheddata *) Malloc(sizeof(cruncheddata)))) { printf("ERROR: Unable to get memory for crunched DNA\n\n"); return 1; } if ((tr = (tree *) Malloc(sizeof(tree))) && (bt = (bestlist *) Malloc(sizeof(bestlist)))) ; else { printf("ERROR: Unable to get memory for trees\n\n"); return 1; } bt->ninit = 0; if (! getinput(adef, rdta, cdta, tr)) return 1; # if Master inputtime = p4_clock(); printf("Input time %d milliseconds\n", inputtime - starttime); REPORT_STEP_TIME; # endif # if Slave (void) fclose(INFILE); # endif /* The material below would be a loop over jumbles and/or boots */ if (! makeweights(adef, rdta, cdta)) return 1; if (! makevalues(rdta, cdta)) return 1; if (adef->empf && ! empiricalfreqs(rdta, cdta)) return 1; reportfreqs(adef, rdta); if (! linkdata2tree(rdta, cdta, tr)) return 1; if (! linkxarray(3, 3, cdta->endsite, & freextip, & usedxtip)) return 1; if (! setupnodex(tr)) return 1; # if Slave if (! slaveTreeEvaluate(tr, bt)) return 1; # else if (! initBestTree(bt, adef->nkeep, tr->mxtips, tr->cdta->endsite)) return 1; if (! adef->usertree) { if (! makeDenovoTree(tr, bt, adef)) return 1; } else { if (! makeUserTree(tr, bt, adef)) return 1; } if (! freeBestTree(bt)) return 1; # endif /* Endpoint for jumble and/or boot loop */ # if Master tr->likelihood = 1.0; /* terminate slaves */ (void) sendTree(& comm_slave, tr); # endif freeTree(tr); # if Master close_link(& comm_slave); (void) fclose(INFILE); REPORT_STEP_TIME; endtime = p4_clock(); printf("Execution time %d milliseconds\n", endtime - inputtime); (void) fclose(OUTFILE); # endif # if Slave close_link(& comm_master); (void) fclose(OUTFILE); # endif # if Debug (void) fclose(debug); # endif # if Master || Slave p4_send(DNAML_DONE, DNAML_HOST_ID, NULL, 0); # else return 0; # endif } /* DNA Maximum Likelihood */ )) return FALSE; done = tr->likelihood > 0.0; if (! done) { if (! treeEvalufastDNAml_1.2.2/source/fastDNAml.h010064400000410000013000000223140703414346600171370ustar00garyarchae00000400000020/* fastDNAml.h */ #define headerName "fastDNAml.h" #define headerVersion "1.2.1" #define headerDate "March 9, 1998" #ifndef dnaml_h /* Compile time switches for various updates to program: * 0 gives original version * 1 gives new version */ #define ReturnSmoothedView 1 /* Propagate changes back after smooth */ #define BestInsertAverage 1 /* Build three taxon tree analytically */ #define DeleteCheckpointFile 0 /* Remove checkpoint file when done */ #define Debug 0 /* Program constants and parameters */ #define maxlogf 1024 /* maximum number of user trees */ #define maxcategories 35 /* maximum number of site types */ #define smoothings 32 /* maximum smoothing passes through tree */ #define iterations 10 /* maximum iterations of makenewz per insert */ #define newzpercycle 1 /* iterations of makenewz per tree traversal */ #define nmlngth 10 /* number of characters in species name */ #define deltaz 0.00001 /* test of net branch length change in update */ #define zmin 1.0E-15 /* max branch prop. to -log(zmin) (= 34) */ #define zmax (1.0 - 1.0E-6) /* min branch prop. to 1.0-zmax (= 1.0E-6) */ #define defaultz 0.9 /* value of z assigned as starting point */ #define unlikely -1.0E300 /* low likelihood for initialization */ /* These values are used to rescale the lilelihoods at a given site so that * there is no floating point underflow. */ #define twotothe256 \ 115792089237316195423570985008687907853269984665640564039457584007913129639936.0 /* 2**256 (exactly) */ #define minlikelihood (1.0/twotothe256) /* 2**(-256) */ #define log_minlikelihood (-177.445678223345993274) /* log(1.0/twotothe256) */ /* The next two values are used for scaling the tree that is sketched in the * output file. */ #define down 2 #define over 60 #define checkpointname "checkpoint" #define badEval 1.0 #define badZ 0.0 #define badRear -1 #define badSigma -1.0 #define TRUE 1 #define FALSE 0 #define treeNone 0 #define treeNewick 1 #define treeProlog 2 #define treePHYLIP 3 #define treeMaxType 3 #define treeDefType treePHYLIP #define ABS(x) (((x)<0) ? (-(x)) : (x)) #define MIN(x,y) (((x)<(y)) ? (x) : (y)) #define MAX(x,y) (((x)>(y)) ? (x) : (y)) #define LOG(x) (((x)>0) ? log(x) : hang("log domain error")) #define NINT(x) ((int) ((x)>0 ? ((x)+0.5) : ((x)-0.5))) #if ! Vectorize typedef char yType; #else typedef int yType; #endif typedef int boolean; typedef double xtype; typedef struct likelihood_vector { xtype a, c, g, t; long exp; } likelivector; typedef struct xmantyp { struct xmantyp *prev; struct xmantyp *next; struct noderec *owner; likelivector *lv; } xarray; typedef struct noderec { double z, z0; struct noderec *next; struct noderec *back; int number; xarray *x; int xcoord, ycoord, ymin, ymax; char name[nmlngth+1]; /* Space for null termination */ yType *tip; /* Pointer to sequence data */ } node, *nodeptr; typedef struct { int numsp; /* number of species (also tr->mxtips) */ int sites; /* number of input sequence positions */ yType **y; /* sequence data array */ boolean freqread; /* user base frequencies have been read */ /* To do: DNA specific values should get packaged into structure */ double freqa, freqc, freqg, freqt, /* base frequencies */ freqr, freqy, invfreqr, invfreqy, freqar, freqcy, freqgr, freqty; double ttratio, xi, xv, fracchange; /* transition/transversion */ /* End of DNA specific values */ int *wgt; /* weight per sequence pos */ int *wgt2; /* weight per pos (booted) */ int categs; /* number of rate categories */ double catrat[maxcategories+1]; /* rates per categories */ int *sitecat; /* category per sequence pos */ } rawdata; typedef struct { int *alias; /* site representing a pattern */ int *aliaswgt; /* weight by pattern */ int endsite; /* # of sequence patterns */ int wgtsum; /* sum of weights of positions */ int *patcat; /* category per pattern */ double *patrat; /* rates per pattern */ double *wr; /* weighted rate per pattern */ double *wr2; /* weight*rate**2 per pattern */ } cruncheddata; typedef struct { double likelihood; double *log_f; /* info for signif. of trees */ node **nodep; node *start; node *outgrnode; int mxtips; int ntips; int nextnode; int opt_level; int log_f_valid; /* log_f value sites */ int global; /* branches to cross in full tree */ int partswap; /* branches to cross in partial tree */ int outgr; /* sequence number to use in rooting tree */ boolean prelabeled; /* the possible tip names are known */ boolean smoothed; boolean rooted; boolean userlen; /* use user-supplied branch lengths */ rawdata *rdta; /* raw data structure */ cruncheddata *cdta; /* crunched data structure */ } tree; typedef struct conntyp { double z; /* branch length */ node *p, *q; /* parent and child sectors */ void *valptr; /* pointer to value of subtree */ int descend; /* pointer to first connect of child */ int sibling; /* next connect from same parent */ } connect, *connptr; typedef struct { double likelihood; double *log_f; /* info for signif. of trees */ connect *links; /* pointer to first connect (start) */ node *start; int nextlink; /* index of next available connect */ /* tr->start = tpl->links->p */ int ntips; int nextnode; int opt_level; /* degree of branch swapping explored */ int scrNum; /* position in sorted list of scores */ int tplNum; /* position in sorted list of trees */ int log_f_valid; /* log_f value sites */ boolean prelabeled; /* the possible tip names are known */ boolean smoothed; /* branch optimization converged? */ } topol; typedef struct { double best; /* highest score saved */ double worst; /* lowest score saved */ topol *start; /* starting tree for optimization */ topol **byScore; topol **byTopol; int nkeep; /* maximum topologies to save */ int nvalid; /* number of topologies saved */ int ninit; /* number of topologies initialized */ int numtrees; /* number of alternatives tested */ boolean improved; } bestlist; typedef struct { long boot; /* bootstrap random number seed */ int extra; /* extra output information switch */ boolean empf; /* use empirical base frequencies */ boolean interleaved; /* input data are in interleaved format */ long jumble; /* jumble random number seed */ int nkeep; /* number of best trees to keep */ int numutrees; /* number of user trees to read */ boolean prdata; /* echo data to output stream */ boolean qadd; /* test addition without full smoothing */ boolean restart; /* resume addition to partial tree */ boolean root; /* use user-supplied outgroup */ boolean trprint; /* print tree to output stream */ int trout; /* write tree to "treefile" */ boolean usertree; /* use user-supplied trees */ boolean userwgt; /* use user-supplied position weight mask */ } analdef; typedef struct { double tipmax; int tipy; } drawdata; void exit(); #if ANSI || MALLOC_VOID void *malloc(); #else char *malloc(); #endif #define Malloc(x) malloc((unsigned) (x)) /* BSD */ /* #define Malloc(x) malloc((size_t) (x)) */ /* System V */ #define Free(x) (void) free((char *) (x)) /* BSD */ /* #define Free(x) free((void *) (x)) */ /* System V */ char *likelihood_key = "likelihood"; char *ntaxa_key = "ntaxa"; char *opt_level_key = "opt_level"; char *smoothed_key = "smoothed"; #define dnaml_h #endif /* #if undef dnaml_h */ fastDNAml_1.2.2/source/Makefile010064400000410000013000000006130703415037500166100ustar00garyarchae00000400000020# Makefile is courtesy of Marc Baudoin -- babafou@babafou.eu.org # On many systems I prefer the gcc compiler over the native compiler # CC = cc # or gcc CFLAGS = -O LDFLAGS = -lm RM = rm -f all : fastDNAml fastDNAml : fastDNAml.o $(CC) $(CFLAGS) -o fastDNAml fastDNAml.o $(LDFLAGS) fastDNAml.o : fastDNAml.c fastDNAml.h $(CC) $(CFLAGS) -c fastDNAml.c clean : $(RM) fastDNAml.o fastDNAml_1.2.2/testdata/004075500000410000013000000000000703414370500154635ustar00garyarchae00000400000020fastDNAml_1.2.2/testdata/test5.out010064400000410000013000000041330703414347000172550ustar00garyarchae00000400000020 fastDNAml, version 1.2.1, March 9, 1998 Based on Joseph Felsenstein's Nucleic acid sequence Maximum Likelihood method, version 3.3 5 Species, 114 Sites Quick add (only local branches initially optimized) in effect Rearrangements of partial trees may cross 1 branch. Rearrangements of full tree may cross 1 branch. Total weight of positions in analysis = 114 There are 41 distinct data patterns (columns) Empirical Base Frequencies: A 0.18570 C 0.24823 G 0.31783 T(U) 0.24823 Transition/transversion ratio = 2.000000 (Transition/transversion parameter = 1.571835) Adding species: Sequence1 Sequence2 Sequence3 Sequence4 Tested 3 alternative trees Ln Likelihood = -336.11996 Sequence5 Tested 5 alternative trees Ln Likelihood = -365.80850 Doing local rearrangements Tested 4 alternative trees Examined 13 trees +----- Sequence5 +-------------------------------3 ! +------------------------- Sequence4 ---2 ! +------------------ Sequence2 +-------------------------1 ! +------------------------- Sequence3 ! +------------------------------- Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -365.80850 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 2 3 0.08775 ( 0.02504, 0.15802) ** 3 Sequence5 0.01873 ( zero, 0.05038) * 3 Sequence4 0.06742 ( 0.01738, 0.12217) ** 2 1 0.06940 ( 0.01393, 0.13070) ** 1 Sequence2 0.05346 ( 0.00521, 0.10607) ** 1 Sequence3 0.06177 ( 0.01061, 0.11786) ** 2 Sequence1 0.08289 ( 0.02281, 0.14988) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 Tree also written to treefile.24358 fastDNAml_1.2.2/testdata/test5.phy010064400000410000013000000013040703414347000172430ustar00garyarchae000004000000205 114 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG fastDNAml_1.2.2/testdata/test5B.out010064400000410000013000000042150703414347000173600ustar00garyarchae00000400000020 fastDNAml, version 1.2.1, March 9, 1998 Based on Joseph Felsenstein's Nucleic acid sequence Maximum Likelihood method, version 3.3 5 Species, 114 Sites Bootstrap random number seed = 137 Quick add (only local branches initially optimized) in effect Rearrangements of partial trees may cross 1 branch. Rearrangements of full tree may cross 1 branch. Total weight of positions in analysis = 114 There are 29 distinct data patterns (columns) Empirical Base Frequencies: A 0.15961 C 0.27385 G 0.31035 T(U) 0.25618 Transition/transversion ratio = 2.000000 (Transition/transversion parameter = 1.591924) Adding species: Sequence1 Sequence2 Sequence3 Sequence4 Tested 3 alternative trees Ln Likelihood = -353.03694 Sequence5 Tested 5 alternative trees Ln Likelihood = -388.06824 Doing local rearrangements Tested 4 alternative trees Ln Likelihood = -383.09705 Tested 4 alternative trees Examined 17 trees +---- Sequence5 +------------------3 ! +------------------ Sequence4 ---1 ! +------------------------------------------ Sequence3 +------------------2 ! +------------- Sequence2 ! +-------------------------------- Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -383.09705 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 1 3 0.06069 ( 0.00570, 0.12148) ** 3 Sequence5 0.01740 ( zero, 0.04945) * 3 Sequence4 0.07628 ( 0.02364, 0.13421) ** 1 2 0.06623 ( 0.00889, 0.12989) ** 2 Sequence3 0.14394 ( 0.06588, 0.23426) ** 2 Sequence2 0.04896 ( zero, 0.10507) ** 1 Sequence1 0.11844 ( 0.04707, 0.19992) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 Tree also written to treefile.24361 fastDNAml_1.2.2/testdata/test5B.phy010064400000410000013000000013150703414347100173500ustar00garyarchae000004000000205 114 B B 137 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG fastDNAml_1.2.2/testdata/test5C.out010064400000410000013000000047720703414347100173720ustar00garyarchae00000400000020 fastDNAml, version 1.2.1, March 9, 1998 Based on Joseph Felsenstein's Nucleic acid sequence Maximum Likelihood method, version 3.3 5 Species, 114 Sites Quick add (only local branches initially optimized) in effect Site category Rate of change 1 0.062 2 0.125 3 0.250 4 0.500 5 1.000 6 2.000 7 4.000 8 8.000 9 16.000 A 32.000 Rearrangements of partial trees may cross 1 branch. Rearrangements of full tree may cross 1 branch. Total weight of positions in analysis = 114 There are 73 distinct data patterns (columns) Empirical Base Frequencies: A 0.18570 C 0.24823 G 0.31783 T(U) 0.24823 Transition/transversion ratio = 2.000000 (Transition/transversion parameter = 1.571835) Adding species: Sequence1 Sequence2 Sequence3 Sequence4 Tested 3 alternative trees Ln Likelihood = -381.89787 Sequence5 Tested 5 alternative trees Ln Likelihood = -418.08178 Doing local rearrangements Tested 4 alternative trees Ln Likelihood = -412.77783 Tested 4 alternative trees Examined 17 trees + Sequence5 +-----------------------------------------------------3 ! + Sequence4 ---1 ! +----------------------------------------------------- Sequence3 +--2 ! +----------------------------------------------------- Sequence2 ! +----------------------------------------------------- Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -412.77783 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 1 3 0.01052 ( 0.00196, 0.01921) ** 3 Sequence5 0.00190 ( zero, 0.00530) ** 3 Sequence4 0.00816 ( 0.00172, 0.01468) ** 1 2 0.00773 ( 0.00062, 0.01493) ** 2 Sequence3 0.00748 ( 0.00094, 0.01410) ** 2 Sequence2 0.00620 ( 0.00021, 0.01225) ** 1 Sequence1 0.01103 ( 0.00235, 0.01983) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 Tree also written to treefile.24364 fastDNAml_1.2.2/testdata/test5C.phy010064400000410000013000000016070703414347100173550ustar00garyarchae000004000000205 114 C C 10 0.0625 0.125 0.25 0.5 1 2 4 8 16 32 Categories 5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9 633792246624457364222574877188898132984963499AA9899975 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG GACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCfastDNAml_1.2.2/testdata/test5GRT.out010064400000410000013000000040540703414347100176350ustar00garyarchae00000400000020 fastDNAml, version 1.2.1, March 9, 1998 Based on Joseph Felsenstein's Nucleic acid sequence Maximum Likelihood method, version 3.3 5 Species, 114 Sites Quick add (only local branches initially optimized) in effect Restart option in effect. Sequence addition will start from appended tree. Rearrangements of partial trees may cross 0 branches. Rearrangements of full tree may cross 0 branches. Total weight of positions in analysis = 114 There are 41 distinct data patterns (columns) Empirical Base Frequencies: A 0.18570 C 0.24823 G 0.31783 T(U) 0.24823 Transition/transversion ratio = 2.000000 (Transition/transversion parameter = 1.571835) Restarting from tree with the following sequences: Sequence1 Sequence2 Sequence4 Sequence5 Adding species: Sequence3 Tested 5 alternative trees Ln Likelihood = -379.14992 Examined 6 trees +--------- Sequence5 ---2 ! +------------------- Sequence4 +--1 ! ! +-------------- Sequence3 ! +--------------------------------------3 ! +------------------- Sequence2 ! +------------------------------------------- Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -379.14992 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 2 Sequence5 0.02776 ( zero, 0.06734) ** 2 1 0.00396 ( zero, 0.03594) 1 Sequence4 0.06711 ( 0.01506, 0.12427) ** 1 3 0.13351 ( 0.05915, 0.21875) ** 3 Sequence3 0.04953 ( 0.00347, 0.09953) ** 3 Sequence2 0.06542 ( 0.01357, 0.12233) ** 2 Sequence1 0.15529 ( 0.07648, 0.24643) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 Tree also written to treefile.24367 fastDNAml_1.2.2/testdata/test5GRT.phy010064400000410000013000000014340703414347200176260ustar00garyarchae000004000000205 114 R G T G 0 0 T 2.0 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG (Sequence4:0.1,Sequence2:0.1,(Sequence1:0.1,Sequence5:0.1):0.1):0.0; fastDNAml_1.2.2/testdata/test5I.out010064400000410000013000000041330703414347200173700ustar00garyarchae00000400000020 fastDNAml, version 1.2.1, March 9, 1998 Based on Joseph Felsenstein's Nucleic acid sequence Maximum Likelihood method, version 3.3 5 Species, 114 Sites Quick add (only local branches initially optimized) in effect Rearrangements of partial trees may cross 1 branch. Rearrangements of full tree may cross 1 branch. Total weight of positions in analysis = 114 There are 41 distinct data patterns (columns) Empirical Base Frequencies: A 0.18570 C 0.24823 G 0.31783 T(U) 0.24823 Transition/transversion ratio = 2.000000 (Transition/transversion parameter = 1.571835) Adding species: Sequence1 Sequence2 Sequence3 Sequence4 Tested 3 alternative trees Ln Likelihood = -336.11996 Sequence5 Tested 5 alternative trees Ln Likelihood = -365.80850 Doing local rearrangements Tested 4 alternative trees Examined 13 trees +----- Sequence5 +-------------------------------3 ! +------------------------- Sequence4 ---2 ! +------------------ Sequence2 +-------------------------1 ! +------------------------- Sequence3 ! +------------------------------- Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -365.80850 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 2 3 0.08775 ( 0.02504, 0.15802) ** 3 Sequence5 0.01873 ( zero, 0.05038) * 3 Sequence4 0.06742 ( 0.01738, 0.12217) ** 2 1 0.06940 ( 0.01393, 0.13070) ** 1 Sequence2 0.05346 ( 0.00521, 0.10607) ** 1 Sequence3 0.06177 ( 0.01061, 0.11786) ** 2 Sequence1 0.08289 ( 0.02281, 0.14988) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 Tree also written to treefile.24370 fastDNAml_1.2.2/testdata/test5I.phy010064400000410000013000000013060703414347200173600ustar00garyarchae000004000000205 114 I Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG fastDNAml_1.2.2/testdata/test5I2.out010064400000410000013000000041330703414347200174520ustar00garyarchae00000400000020 fastDNAml, version 1.2.1, March 9, 1998 Based on Joseph Felsenstein's Nucleic acid sequence Maximum Likelihood method, version 3.3 5 Species, 114 Sites Quick add (only local branches initially optimized) in effect Rearrangements of partial trees may cross 1 branch. Rearrangements of full tree may cross 1 branch. Total weight of positions in analysis = 114 There are 41 distinct data patterns (columns) Empirical Base Frequencies: A 0.18570 C 0.24823 G 0.31783 T(U) 0.24823 Transition/transversion ratio = 2.000000 (Transition/transversion parameter = 1.571835) Adding species: Sequence1 Sequence2 Sequence3 Sequence4 Tested 3 alternative trees Ln Likelihood = -336.11996 Sequence5 Tested 5 alternative trees Ln Likelihood = -365.80850 Doing local rearrangements Tested 4 alternative trees Examined 13 trees +----- Sequence5 +-------------------------------3 ! +------------------------- Sequence4 ---2 ! +------------------ Sequence2 +-------------------------1 ! +------------------------- Sequence3 ! +------------------------------- Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -365.80850 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 2 3 0.08775 ( 0.02504, 0.15802) ** 3 Sequence5 0.01873 ( zero, 0.05038) * 3 Sequence4 0.06742 ( 0.01738, 0.12217) ** 2 1 0.06940 ( 0.01393, 0.13070) ** 1 Sequence2 0.05346 ( 0.00521, 0.10607) ** 1 Sequence3 0.06177 ( 0.01061, 0.11786) ** 2 Sequence1 0.08289 ( 0.02281, 0.14988) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 Tree also written to treefile.24373 fastDNAml_1.2.2/testdata/test5I2.phy010064400000410000013000000014400703414347300174420ustar00garyarchae000004000000205 114 I Sequence1 1 ACACGGTGTC GTATCATGCT GCAGGATGCT AGACTGCGTC ANATGTTCGT ACTAACTGTG 61 AGCTCGATGA TCGGTGACGT AGACTCAGGG GCCATGCCGC GAGTTTGCGA TGCG Sequence2 1 ACGCGGTGTC GTGTCATGCT ACATTATGCT AGACTGCGTC GGATGCTCGT ATTGACTGCG 61 AGCACGGTGA TCAATGACGT AGNCTCAGGR TCCACGCCGT GACTTTGTGA TNCG Sequence3 1 ACGCGGTGCC GTGTNATGCT GCATTATGCT CGACTGCGRC GGATGCTAGT ATTGACTGCG 61 AGCACGATGA CCGATGACGT AGACTGAGGG TCCGTGCCGC GACTTTGTGA TGCG Sequence4 1 ACGCGCTGCC GTGTCATCCT ACACGATGCY AGACAGCGTC AGCTGCTAGT ACTGGCTGAG 61 ACCTCGGTGA TTGATGACGT AGACTGCGGG TCCATGCCGC GATTTTGCGR TGCG Sequence5 1 ACGCGCTGTC GTGTCATACT GCAGGATGCT AGACTGCGTC AGCTGCTAGT ACTGGCTGAG 61 ACCTCGATGC TCGATGACGT AGACTGCGGG TCCATGCCGT GATTTTGCGA TGCG fastDNAml_1.2.2/testdata/test5J.out010064400000410000013000000041740703414347300173770ustar00garyarchae00000400000020 fastDNAml, version 1.2.1, March 9, 1998 Based on Joseph Felsenstein's Nucleic acid sequence Maximum Likelihood method, version 3.3 5 Species, 114 Sites Jumble random number seed = 137 Quick add (only local branches initially optimized) in effect Rearrangements of partial trees may cross 1 branch. Rearrangements of full tree may cross 1 branch. Total weight of positions in analysis = 114 There are 41 distinct data patterns (columns) Empirical Base Frequencies: A 0.18570 C 0.24823 G 0.31783 T(U) 0.24823 Transition/transversion ratio = 2.000000 (Transition/transversion parameter = 1.571835) Adding species: Sequence4 Sequence2 Sequence5 Sequence1 Tested 3 alternative trees Ln Likelihood = -323.61247 Sequence3 Tested 5 alternative trees Ln Likelihood = -365.80850 Doing local rearrangements Tested 4 alternative trees Examined 13 trees +----- Sequence5 +-------------------------------1 ! +------------------------- Sequence4 ---2 ! +------------------ Sequence2 +-------------------------3 ! +------------------------- Sequence3 ! +------------------------------- Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -365.80850 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 2 1 0.08775 ( 0.02504, 0.15802) ** 1 Sequence5 0.01873 ( zero, 0.05038) * 1 Sequence4 0.06742 ( 0.01738, 0.12217) ** 2 3 0.06940 ( 0.01393, 0.13070) ** 3 Sequence2 0.05346 ( 0.00521, 0.10607) ** 3 Sequence3 0.06177 ( 0.01061, 0.11786) ** 2 Sequence1 0.08289 ( 0.02281, 0.14988) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 Tree also written to treefile.24376 fastDNAml_1.2.2/testdata/test5J.phy010064400000410000013000000013150703414347300173620ustar00garyarchae000004000000205 114 J J 137 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG fastDNAml_1.2.2/testdata/test5R.out010064400000410000013000000045020703414347400174030ustar00garyarchae00000400000020 fastDNAml, version 1.2.1, March 9, 1998 Based on Joseph Felsenstein's Nucleic acid sequence Maximum Likelihood method, version 3.3 5 Species, 114 Sites Quick add (only local branches initially optimized) in effect Restart option in effect. Sequence addition will start from appended tree. Rearrangements of partial trees may cross 1 branch. Rearrangements of full tree may cross 1 branch. Total weight of positions in analysis = 114 There are 41 distinct data patterns (columns) Empirical Base Frequencies: A 0.18570 C 0.24823 G 0.31783 T(U) 0.24823 Transition/transversion ratio = 2.000000 (Transition/transversion parameter = 1.571835) Restarting from tree with the following sequences: Sequence1 Sequence2 Sequence4 Sequence5 Adding species: Ln Likelihood = -339.03054 Doing local rearrangements Tested 2 alternative trees Ln Likelihood = -323.61247 Tested 2 alternative trees Sequence3 Tested 5 alternative trees Ln Likelihood = -365.80850 Doing local rearrangements Tested 4 alternative trees Examined 14 trees +------------------------- Sequence3 +-------------------------3 ! +------------------ Sequence2 ---2 ! +----- Sequence5 +-------------------------------1 ! +------------------------- Sequence4 ! +------------------------------- Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -365.80850 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 2 3 0.06940 ( 0.01393, 0.13070) ** 3 Sequence3 0.06177 ( 0.01061, 0.11786) ** 3 Sequence2 0.05346 ( 0.00521, 0.10607) ** 2 1 0.08775 ( 0.02504, 0.15802) ** 1 Sequence5 0.01873 ( zero, 0.05038) * 1 Sequence4 0.06742 ( 0.01738, 0.12217) ** 2 Sequence1 0.08289 ( 0.02281, 0.14988) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 Tree also written to treefile.24379 fastDNAml_1.2.2/testdata/test5R.phy010064400000410000013000000014140703414347400173730ustar00garyarchae000004000000205 114 R Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG (Sequence4:0.1,Sequence2:0.1,(Sequence1:0.1,Sequence5:0.1):0.1):0.0; fastDNAml_1.2.2/testdata/test5TF.out010064400000410000013000000041610703414347400175140ustar00garyarchae00000400000020 fastDNAml, version 1.2.1, March 9, 1998 Based on Joseph Felsenstein's Nucleic acid sequence Maximum Likelihood method, version 3.3 5 Species, 114 Sites Quick add (only local branches initially optimized) in effect Rearrangements of partial trees may cross 1 branch. Rearrangements of full tree may cross 1 branch. Total weight of positions in analysis = 114 There are 41 distinct data patterns (columns) Base Frequencies: A 0.25000 C 0.25000 G 0.25000 T(U) 0.25000 Transition/transversion ratio = 0.501000 (Transition/transversion parameter = 0.001000) Adding species: Sequence1 Sequence2 Sequence3 Sequence4 Tested 3 alternative trees Ln Likelihood = -342.76283 Sequence5 Tested 5 alternative trees Ln Likelihood = -371.36406 Doing local rearrangements Tested 4 alternative trees Examined 13 trees +------ Sequence5 +----------------------------------3 ! +--------------------------- Sequence4 ---2 ! +--------------------------- Sequence2 +---------------------------1 ! +--------------------------- Sequence3 ! +---------------------------------- Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -371.36406 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 2 3 0.07952 ( 0.02075, 0.14329) ** 3 Sequence5 0.02019 ( zero, 0.05147) * 3 Sequence4 0.06439 ( 0.01576, 0.11638) ** 2 1 0.06948 ( 0.01584, 0.12726) ** 1 Sequence2 0.05617 ( 0.00822, 0.10739) ** 1 Sequence3 0.05714 ( 0.00917, 0.10839) ** 2 Sequence1 0.08148 ( 0.02287, 0.14505) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 Tree also written to treefile.24382 fastDNAml_1.2.2/testdata/test5TF.phy010064400000410000013000000013470703414347400175100ustar00garyarchae000004000000205 114 T F T 0.501 F 0.25 0.25 0.25 0.25 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG fastDNAml_1.2.2/testdata/test5U.out010064400000410000013000000126450703414347500174160ustar00garyarchae00000400000020 fastDNAml, version 1.2.1, March 9, 1998 Based on Joseph Felsenstein's Nucleic acid sequence Maximum Likelihood method, version 3.3 5 Species, 114 Sites Quick add (only local branches initially optimized) in effect User-supplied tree topology will be used. Total weight of positions in analysis = 114 There are 41 distinct data patterns (columns) Empirical Base Frequencies: A 0.18570 C 0.24823 G 0.31783 T(U) 0.24823 Transition/transversion ratio = 2.000000 (Transition/transversion parameter = 1.571835) User-defined trees: +------------------ Sequence2 +-------------------------2 ! +------------------------- Sequence3 ---1 ! +------------------------- Sequence4 +-------------------------------3 ! +----- Sequence5 ! +------------------------------- Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -365.80850 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 1 2 0.06940 ( 0.01393, 0.13070) ** 2 Sequence2 0.05346 ( 0.00521, 0.10607) ** 2 Sequence3 0.06177 ( 0.01061, 0.11786) ** 1 3 0.08775 ( 0.02504, 0.15802) ** 3 Sequence4 0.06742 ( 0.01738, 0.12217) ** 3 Sequence5 0.01873 ( zero, 0.05038) * 1 Sequence1 0.08289 ( 0.02281, 0.14988) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 +------------------- Sequence3 ! ---2 +------------------- Sequence4 ! +---------------------------------------3 +--1 +--------- Sequence5 ! ! ! +------------------- Sequence2 ! +--------------------------------------- Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -374.57884 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 2 Sequence3 0.06867 ( 0.01299, 0.13022) ** 2 1 0.00000 ( zero, 0.02880) 1 3 0.14082 ( 0.06276, 0.23097) ** 3 Sequence4 0.06155 ( 0.01197, 0.11574) ** 3 Sequence5 0.02596 ( zero, 0.06481) ** 1 Sequence2 0.06268 ( 0.00894, 0.12187) ** 2 Sequence1 0.14157 ( 0.06516, 0.22951) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 +------------- Sequence2 ! ---2 +------------------ Sequence4 ! +-------------------------------------3 +----1 +-------- Sequence5 ! ! ! +------------------ Sequence3 ! +------------------------------------- Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -372.26531 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 2 Sequence2 0.05076 ( 0.00335, 0.10237) ** 2 1 0.02085 ( zero, 0.05426) * 1 3 0.13281 ( 0.05866, 0.21777) ** 3 Sequence4 0.05946 ( 0.01021, 0.11325) ** 3 Sequence5 0.02831 ( zero, 0.06896) ** 1 Sequence3 0.05492 ( 0.00578, 0.10858) ** 2 Sequence1 0.13541 ( 0.06223, 0.21911) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 +---------------------------------------------------- Sequence2 +-------2 ! +---------------------------------------------------- Sequence4 ---1 ! +---------------------------------------------------- Sequence3 +--3 ! +-------------------------------------------- Sequence5 ! +-------------------------------------------- Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -399.62271 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 1 2 0.01197 ( zero, 0.04296) 2 Sequence2 0.11578 ( 0.04568, 0.19547) ** 2 Sequence4 0.12038 ( 0.04919, 0.20149) ** 1 3 0.00000 ( zero, 0.02449) 3 Sequence3 0.10862 ( 0.04249, 0.18322) ** 3 Sequence5 0.09735 ( 0.03466, 0.16759) ** 1 Sequence1 0.10726 ( 0.04319, 0.17925) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 Tree also written to treefile.24385 Tree Ln L Diff Ln L Its S.D. Significantly worse? 1 -365.80850 <------ best 2 -372.26531 -6.45680 6.6552 No 3 -374.57884 -8.77033 5.3864 No 4 -399.62271 -33.81420 11.7723 Yes fastDNAml_1.2.2/testdata/test5U.phy010064400000410000013000000016550703414347500174060ustar00garyarchae000004000000205 114 U Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG 4 (Sequence1,(Sequence2,Sequence3),(Sequence4,Sequence5)); (Sequence2,(Sequence1,Sequence3),(Sequence4,Sequence5)); (Sequence3,(Sequence1,Sequence2),(Sequence4,Sequence5)); (Sequence1,(Sequence2,Sequence4),(Sequence3,Sequence5)); fastDNAml_1.2.2/testdata/test5W.out010064400000410000013000000041550703414347500174150ustar00garyarchae00000400000020 fastDNAml, version 1.2.1, March 9, 1998 Based on Joseph Felsenstein's Nucleic acid sequence Maximum Likelihood method, version 3.3 5 Species, 114 Sites Quick add (only local branches initially optimized) in effect Rearrangements of partial trees may cross 1 branch. Rearrangements of full tree may cross 1 branch. Total weight of positions in analysis = 60 There are 25 distinct data patterns (columns) Empirical Base Frequencies: A 0.21812 C 0.25686 G 0.30872 T(U) 0.21630 Transition/transversion ratio = 2.000000 (Transition/transversion parameter = 1.531841) Adding species: Sequence1 Sequence2 Sequence3 Sequence4 Tested 3 alternative trees Ln Likelihood = -175.49147 Sequence5 Tested 5 alternative trees Ln Likelihood = -188.51861 Doing local rearrangements Tested 4 alternative trees Examined 13 trees +------ Sequence5 +------------------------------------3 ! +--------------------- Sequence4 ---2 ! +--------------------- Sequence2 +---------------------1 ! +------------------------------------ Sequence3 ! +------------------------------------ Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -188.51861 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 2 3 0.08381 ( 0.00311, 0.17730) ** 3 Sequence5 0.01801 ( zero, 0.05604) * 3 Sequence4 0.05155 ( zero, 0.11574) ** 2 1 0.04339 ( zero, 0.11187) ** 1 Sequence2 0.05938 ( zero, 0.13473) ** 1 Sequence3 0.08823 ( 0.00817, 0.18086) ** 2 Sequence1 0.07614 ( 0.00187, 0.16110) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 Tree also written to treefile.24388 fastDNAml_1.2.2/testdata/test5W.phy010064400000410000013000000015230703414347600174030ustar00garyarchae000004000000205 114 W Weights 111111111111001100000100011111100000000000000110000110000000 111101111111111111111111011100000111001011100000000011 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG fastDNAml_1.2.2/testdata/test5WC.out010064400000410000013000000047070703414347600175240ustar00garyarchae00000400000020 fastDNAml, version 1.2.1, March 9, 1998 Based on Joseph Felsenstein's Nucleic acid sequence Maximum Likelihood method, version 3.3 5 Species, 114 Sites Quick add (only local branches initially optimized) in effect Site category Rate of change 1 0.062 2 0.125 3 0.250 4 0.500 5 1.000 6 2.000 7 4.000 8 8.000 9 16.000 A 32.000 Rearrangements of partial trees may cross 1 branch. Rearrangements of full tree may cross 1 branch. Total weight of positions in analysis = 60 There are 42 distinct data patterns (columns) Empirical Base Frequencies: A 0.21812 C 0.25686 G 0.30872 T(U) 0.21630 Transition/transversion ratio = 2.000000 (Transition/transversion parameter = 1.531841) Adding species: Sequence1 Sequence2 Sequence3 Sequence4 Tested 3 alternative trees Ln Likelihood = -192.11599 Sequence5 Tested 5 alternative trees Ln Likelihood = -207.45621 Doing local rearrangements Tested 4 alternative trees Ln Likelihood = -206.60058 Tested 4 alternative trees Examined 17 trees + Sequence5 +----------------------------------------3 ! +---------------- Sequence4 ---1 ! +------------------------ Sequence2 +----------------2 ! +---------------------------------------- Sequence3 ! +-------------------------------- Sequence1 Remember: this is an unrooted tree! Ln Likelihood = -206.60058 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 1 3 0.07563 ( zero, 0.16305) ** 3 Sequence5 0.01329 ( zero, 0.04273) 3 Sequence4 0.04488 ( zero, 0.10050) ** 1 2 0.02816 ( zero, 0.08381) 2 Sequence2 0.05576 ( zero, 0.12600) ** 2 Sequence3 0.08145 ( 0.00425, 0.17027) ** 1 Sequence1 0.06884 ( zero, 0.14680) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 Tree also written to treefile.24391 fastDNAml_1.2.2/testdata/test5WC.phy010064400000410000013000000020250703414347600175040ustar00garyarchae000004000000205 114 W C Weights 111111111111001100000100011111100000000000000110000110000000 111101111111111111111111011100000111001011100000000011 C 10 0.0625 0.125 0.25 0.5 1 2 4 8 16 32 Categories 5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9 633792246624457364222574877188898132984963499AA9899975 Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG