pax_global_header00006660000000000000000000000064131574614550014525gustar00rootroot0000000000000052 comment=bc9d8070b2cd1f1352f74282e5b209302eae38a1 Seq-Gen-1.3.4/000077500000000000000000000000001315746145500127715ustar00rootroot00000000000000Seq-Gen-1.3.4/.gitignore000066400000000000000000000000401315746145500147530ustar00rootroot00000000000000._* *.o *~ *.swp source/seq-gen Seq-Gen-1.3.4/README000077500000000000000000000010661315746145500136570ustar00rootroot00000000000000This package is the generic source-code version. This can be compiled and run on most UNIX/Linux/Mac OS X systems. It can also be compiled for Windows. For Mac OS X a pre-compiled version is available from the website: http://tree.bio.ed.ac.uk/software/Seq-Gen/ There is a manual in HTML format in the doc/ directory of this package. On most UNIX systems, to compile, type: cd source make A binary called 'seq-gen' will be created in the same directory as this README file. Any questions about Seq-Gen should be sent to: Andrew Rambaut Seq-Gen-1.3.4/documentation/000077500000000000000000000000001315746145500156425ustar00rootroot00000000000000Seq-Gen-1.3.4/documentation/Seq-Gen.Manual.html000077500000000000000000001263621315746145500212200ustar00rootroot00000000000000 Seq-Gen: Simulation of molecular sequences

Seq-Gen

Sequence-Generator: An application for the Monte Carlo simulation of molecular sequence evolution along phylogenetic trees.
Version 1.3.4

© Copyright 1996-2017
Andrew Rambaut and Nicholas C. Grassly

Supported by The Royal Society of London

Institute of Evolutionary Biology,
University of Edinburgh,
Ashworth Laboratories,
King's Buildings,
Edinburgh EH9 2FL, U.K.

Originally developed at:
Department of Zoology,
University of Oxford,
South Parks Road,
Oxford OX1 3PS, U.K.

New features and bugs fixed in version 1.3.4 - 16 Sept 2017

New features and bugs fixed in version 1.3.3 - 28 Oct 2011

Bug fixed in version 1.3.2 - 7 Jan 2005

Bug fixed in version 1.3.1 - 4 Nov 2004

New Features in version 1.3 - 30 Aug 2004

Citation

If you use this program in a publication please cite the following reference:

Rambaut, A. and Grassly, N. C. (1997) Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13: 235-238.

Introduction

Seq-Gen is a program that will simulate the evolution of nucleotide or amino acid sequences along a phylogeny, using common models of the substitution process. A range of models of molecular evolution are implemented including the general reversible model. Nucleotide/Amino acid frequencies and other parameters of the model may be given and site-specific rate heterogeneity may also be incorporated in a number of ways. Any number of trees may be read in and the program will produce any number of data sets for each tree. Thus large sets of replicate simulations can be easily created. It has been designed to be a general purpose simulator that incorporates most of the commonly used (and computationally tractable) models of molecular sequence evolution. The paper cited above contains details of the algorithm and a short discussion about the uses of Seq-Gen.

For the purposes of this manual, we assume that the user is familiar with the theory and practice of molecular evolution as well as the use of their computer system.

Requirements

Seq-Gen is a command-line controlled program written in ANSI C. It should be easily compiled and run on any UNIX system or workstation (which includes Mac OS X). This paper describes the use of Seq-Gen on a UNIX machine. The application requires an amount of memory proportional to the size of each simulated sequence data set.

The Mac OS X version of Seq-Gen now functions in the same way as the UNIX version using the 'Terminal' application. There is however a new graphical user-interface that can be used to run Seq-Gen on Mac OS X (and hopefully soon, Windows and Linux) written by Thomas Wilcox. This is available in the Mac OS X package for Seq-Gen:

http://evolve.zoo.ox.ac.uk/software/Seq-Gen/

Acknowledgements

A.R is supported by a Royal Society University Research Fellowship and previously was supported by grant 50275 from The Wellcome Trust. N.C.G. is also supported by a Royal Society University Research Fellowship. We would like to thank Ziheng Yang for allowing us to use some invaluable code from PAML.

The models of substitution

All the models of molecular substitution implemented in Seq-Gen are time-reversible Markov models, and assume evolution is independent and indentical at each site and along each lineage. Almost all models used in the maximum likelihood reconstruction of phylogenies using nucleotide sequences are processes of this type (but see Yang, 1994). Selecting either a nucleotide or amino acid model of substitution will determine which type of data is produced.

Nucleotide models of substitution

The Hasegawa, Kishino and Yano (HKY) model (Hasegawa et al., 1985) allows for a different rate of transitions and transversions as well as unequal frequencies of the four nucleotides (base frequencies). The parameters requires by this model are transition to transversion ratio (TS/TV) and the base frequencies. There are a number of simpler models that are specific cases of the HKY model. If the base frequencies are set equal (by not specifying base frequencies) then the model becomes equivalent to the Kimura 2-parameter (K2P) model (Kimura, 1980). If the TS TV rates are set to be equal (by not specifying a TS/TV ratio) as well, then it becomes equivalent to the Jukes-Cantor (JC69) model (Jukes and Cantor, 1969).

The F84 model (Felsenstein and Churchill, 1996), as implemented in DNAML in the PHYLIP package (Felsenstein, 1993), is very similar to HKY but differs slightly in how it treats transitions and transversions. This model requires the same parameters as HKY.

Finally, the general reversible process (GTR) model (e.g. Yang, 1994) allows 6 different rate parameters and is the most general model that is still consistent with the requirement of being reversible. The 6 parameters are the relative rates for each type of substitution (i.e. A to C, A to G, A to T, C to G, C to T and G to T). As this is a time-reversible process, the rate parameter of one type of substitution (e.g., A to T) is assumed to be the same as the reverse (e.g., T to A).

Amino acid models of substitution

A number of empirical models of amino acid substitution are included with Seq-Gen. These include JTT (Jones et al, 1992), WAG (Whelan & Goldman, 2001), PAM (Dayhoff et al, 1978), Blosum62 (Henikoff & Henikoff, 1992), mtREV (Adachi & Hasegawa, 1996) and cpREV (Adachi & Hasegawa, 2000). These models specify empirical relative rates of substitution and equilibrium amino acid frequencies. Alternatively, the frequencies can be specified or set to be equal. The GENERAL model allows the user to specify the relative rates of substitution and amino acid frequencies.

Site-specific rate heterogeneity

Site-specific rate heterogeneity allows different sites to evolve at different rates. Two models of rate heterogeneity are implemented. The first is a codon-based model in which the user may specify a different rate for each codon position. This can be used simulate the protein-coding sequences for which the third codon position evolves faster than the first and second because a substitution at this position is considerably less likely to cause an amino-acid substitution. Likewise, the first codon position is expected to evolve slightly faster than the second. Obviously this can only be used with nucleotide models of substitution.

The second model of rate heterogeneity assigns different rates to different sites according to a gamma distribution (Yang, 1993). The distribution is scaled such that the mean rate for all the sites is 1 but the user must supply a parameter which describes its shape. A low value for this parameter (<1.0) simulates a large degree of site-specific rate heterogeneity and as this value increases the simulated data becomes more rate-homogeneous. This can be performed as a continuous model, i.e. every site has a different rate sampled from the gamma distribution of the given shape, or as a discrete model, i.e. each site falls into one of N rate categories approximating the gamma distribution. For a review of site-specific rate heterogeneity and its implications for phylogenetic analyses, see Yang (1996).

Seq-Gen also implements the invariable sites model. With this model, a specified proportion of sites are expected to be invariable across the whole tree. The expected number of substitutions then fall on the remaining variable sites.

The final way of introducing site-specific rate heterogeneity is to specify a number of partitions and give these partitions relative rates. See section 'Input File Format', below, for details about how to do this.

Compilation and Execution

Seq-Gen is written in ANSI C and should compile on most UNIX systems and workstations. In this manual I will describe the process of installation and compilation on a UNIX system. This applies to Mac OS X if run under the Terminal. Alternatively a Mac OS X package is available that contains a User-Interface program to run Seq-Gen.

Compilation on UNIX

A simple Makefile is included in the package. You should edit this and change the name of the compiler (by default this is cc) and add any flags for optimisation on your system (an example is given for SUN compilers). Once this is done, type:

make

The program will then compile and you will have an executable program: seq-gen.

Running Seq-Gen

To run Seq-Gen you type:

seq-gen [parameters] < [trees] > [sequences]

where [parameters] are the parameters for the program (described below), [trees] is the tree file and [sequences] is the name of the file that will contain the simulated sequences. The tree file must contain one or more trees in PHYLIP format (see below).

The sequences produced by Seq-Gen are written to the standard output (and can thus redirected to the output file using > [filename]). Other information and results are written to the standard error and thus will appear on the screen.

Seeding the random number generator

Seq-Gen uses a pseudo-random number generator called MT19937 devised by Makoto Matsumoto (see the website >http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html< for more details). Like all pseudo-random number generators, this produces a sequence of random numbers for a given 'seed' number. This is why it is pseudo-random: for a particular seed, the algorithm will always give the same sequence of numbers (and thus the same simulated sequences out of Seq-Gen).

The -z option can be used to specify a seed (a non-zero, positive integer). When Seq-Gen is called using the default seed, the actual seed used is printed to the screen. If this number is noted down and then specified using the -z option (and exactly the same simulation settings) then Seq-Gen will generate exactly the same simulated data. Thus by recording the options and seed, it is possible to regenerate a simulated data set. This is useful if the simulated datasets are extremely large - they can be deleted and then reconstructed if required. If you do this, keep a copy of the exact version of Seq-Gen you used because this technique may not work when the seed came from a different version of Seq-Gen.

By default, Seq-Gen uses as its seed the time taken from the system clock (it actually combines the number of seconds that have passed since 1970 with a millisecond timer). This means that if sequential runs of Seq-Gen are done very quickly (e.g., short runs using a script) the default seed could be very similar or even identical. This has serious consequences for the independence of the simulated data. Basically, it is inadvisable to call Seq-Gen many times using a script, running one simulation per call. If the -n option is used to produce multiple simulated data sets from a single call, this will insure adequate independence. If it is necessary to script short runs of Seq-Gen we have two suggestions to avoid the above problem. Firstly, the -z option could be used to call Seq-Gen multiple times in a sequence by deriving a seed for each call using another random number generator. Secondly, adequate time could be allowed to lapse between calls to Seq-Gen (either by using simulating many data sets and then just using the first or by calling a time wasting function in your scripting language). We can't predict whether either of these solutions will work properly so please use caution.

Input file format

The tree format is the same as used by PHYLIP (also called the "Newick" format) This is a nested set of bifurcations defined by brackets and commas. Here are two examples:

(((Taxon1:0.2,Taxon2:0.2):0.1,Taxon3:0.3):0.1,Taxon4:0.4);

((Taxon1:0.1,Taxon2:0.2):0.05,Taxon3:0.3,Taxon4:0.4);

The first is a rooted tree because it has a bifurcation at the highest level. The next tree is unrooted - it has a trifurcation at the top level. Each tree should finish with a semicolon. Any number of trees may be in the input file separated by a semi-colon and a new-line. Whilst PHYLIP only allows taxon names of up to 10 characters, Seq-Gen can read trees with taxon names of up to 256 characters. Unless the -o option is set (see below), the output file will conform to the PHYLIP format and the names will be truncated to 10 characters. Note that this could cause some taxon names to be identical and this can cause problems in some phylogenetic packages.

Optionally, the user can supply a sequence alignment as input, as well as the trees. This should be in relaxed PHYLIP format. The trees can then be placed in this file at the end, after a line stating how many trees there are. The file may look like this:

4 50
Taxon1 ATCTTTGTAGTCATCGCCGTATTAGCATTCTTAGATCTAA
Taxon2 ATCCTAGTAGTCGCTTGCGCACTAGCCTTCCGAAATCTAG
Taxon3 ACTTCTGTGTTTACTGAGCTACTAGCTTCCCTAAATCTAG
Taxon4 ATTCCTATATTCGCTAATTTCTTAGCTTTCCTGAATCTGG
1
(((Taxon1:0.2,Taxon2:0.2):0.1,Taxon3:0.3):0.1,Taxon4:0.4);

Note that the labels in the alignment do not have to match those in the tree (the ones in the tree will be used for output) - there doesn't even have to be the same number of taxa in the alignment as in the trees. The sequence length supplied by the alignment will be used to obtain the simulated sequence length (unless the -l option is set). The -k option also refers to one of the sequences to specify the ancestral sequence.

Data partitions with different trees

The user can input different trees for different partitions of the dataset. A partition is a set of contiguous sites that has evolved under a single tree. Using multiple partitions with different trees, a recombinant history for the sequences can be simulated. Assuming a 1000 bp sequence length and 2 partitions consisting of 400bp and 600bp respectively, the following input treefile could be used:

[400](((Taxon1:0.2,Taxon2:0.2):0.1,Taxon3:0.3):0.1,Taxon4:0.4);
[600]((Taxon1:0.1,Taxon3:0.2):0.05,Taxon2:0.3,Taxon4:0.4);

Note the partition lengths in square brackets before each tree. These must sum to the specified total sequence length (given by the -l option). Multiple sets of partition trees may be input with different trees, numbers of partitions and partition lengths. Seq-Gen will work out the number of partitions for each replicate by the partition lengths (the maximum number of partitions must be given by the -p option.

For example:

[400](((Taxon1:0.2,Taxon2:0.2):0.1,Taxon3:0.3):0.1,Taxon4:0.4);
[600]((Taxon1:0.1,Taxon3:0.2):0.05,Taxon2:0.3,Taxon4:0.4);
[300](((Taxon1:0.2,Taxon2:0.2):0.1,Taxon3:0.3):0.1,Taxon4:0.4);
[400]((Taxon1:0.1,Taxon3:0.2):0.05,Taxon2:0.3,Taxon4:0.4);
[300]((Taxon1:0.1,Taxon2:0.2):0.05,Taxon3:0.3,Taxon4:0.4);

will generate 2 datasets, the first consisting of 2 partitions (400bp and 600bp) and the second consisting of 3 partitions (300bp, 400bp and 300bp).

Data partitions with different rates

The user can also input the same tree for all partitions of the dataset and then specify a relative rate of evolution for each. This allows partition rate heterogeneity. The relative rates should have a mean of 1.0 (although, if they don't the program will scale them so that they do).

For example:

[300, 0.5](((Taxon1:0.2,Taxon2:0.2):0.1,Taxon3:0.3):0.1,Taxon4:0.4);
[400, 1.75](((Taxon1:0.2,Taxon3:0.2):0.1,Taxon2:0.3):0.1,Taxon4:0.4);
[300, 0.75](((Taxon1:0.2,Taxon2:0.2):0.1,Taxon3:0.3):0.1,Taxon4:0.4);

will generate 3 partitions (300bp, 400bp and 300bp) with relative rates of 0.5, 1.75 and 0.75 along the same tree.

Output File Format

The default format for the output files was chosen for its simplicity and for the wide range of programs that use it. All of the programs in the PHYLIP package that accept molecular sequences can analyse multiple data sets in the format produced by Seq-Gen. Seq-Gen can now also generate NEXUS format output (see the -o option, below) for use with the PAUP program (Swofford, 1993). A PAUP command block (or any other text) can be inserted between each simulated dataset to automate the analysis process (see the -x option, below).

Parameters to control Seq-Gen

Options and parameters to Seq-Gen are supplied on the command line. The general format is a minus sign followed by a code letter. If required, the values of any parameters come after the code, separated from both code and each other with either a comma or a space. Some options act like switches and require no parameters. The case of the options is ignored.

Model

This option sets the model of nucleotide or amino acid substitution. For nucleotides there is a choice of either F84, HKY (also known as HKY85) or GTR (markov general reversable model - also known as REV). The first two models are quite similar but not identical. They both require a transition transversion ratio and relative base frequencies as parameters. Other models such as K2P, F81 and JC69 are special cases of HKY and can be obtained by setting the nucleotide frequencies equal (for K2P) or the transition transversion ratio to 0.5 (for F81) or both (for JC69). A transition transversion ratio of 0.5 is the equivalent to equal rates of tansitions and transversions because there are twice as many possible transversions.

For amino acid models there is the choice of JTT, WAG, PAM, BLOSUM, MTREV or CPREV. By default these models specify both the relative rates of substitution and amino acid frequencies. The GENERAL model is the amino acid equivalent of the general time reversible and a relative rate matrix and amino acid frequences can be specified using the -r and -f options, respectively.

The usage is:

-m <MODEL>

Where <MODEL> is the model name: HKY, F84, GTR, JTT, WAG, PAM, BLOSUM, MTREV, CPREV or GENERAL. For compatibility with older versions REV is treated as synonymous with GTR.

Length of Sequences

This option allows the user to set the length in nucleotides or amino acids that each simulated sequence should be.

-l <SEQUENCE_LENGTH>

Where <SEQUENCE_LENGTH> is an integer number greater than zero. If an alignment is supplied as input and this option is not set, then Seq-Gen will use the length of the sequences in the alignment.

Number of Replicate Datasets

This option specifies how many separate datasets should be simulated for each tree in the tree file.

-n <NUMBER_OF_DATASETS>

Where <NUMBER_OF_DATASETS> is an integer number that corresponds to the number of datasets to be simulated.

Number of Data Partitions

This option specifies how many partitions of each data set should be simulated. Each partition must have its own tree and a number specifying how many sites are in the partition. See 'Input file format', above, for details. Multiple sets of trees are being inputted with varying numbers of partitions, then this should specify the maximum number of partitions that will be required.

-p <NUMBER_OF_PARTITIONS>

Where <NUMBER_OF_PARTITIONS> is an integer number that corresponds to the number of partitions for each dataset.

Scale branch lengths

This option allows the user to set a value with which to scale the branch lengths in order to make them equal the expected number of substitutions per site for each branch. Basically Seq-Gen multiplies each branch length by this value.

-s <SCALE>

Where <SCALE> is a decimal number greater than zero. For example if you give an value of 0.5 then each branch length would be halved before using it to simulate the sequences.

Scale tree length

This option allows the user to set a value which is the desired length of each tree in units of subsitutions per site. The term tree length here is the distance from the root to any one of the tips in units of mean number of subsitutions per site. This option can only be used when the input trees are rooted and ultrametric (no difference in rate amongst the lineages). This has the effect of making all the trees in the input file of the same length before simulating data.

-d <SCALE>

Where <SCALE> is a decimal number greater than zero. The option multiplies each branch length by a value equal to SCALE divided by the actual length of the tree.

Codon-Specific Rate Heterogeneity

Using this option the user may specify the relative rates for each codon position. This allows codon-specific rate heterogeneity to be simulated. The default is no site-specific rate heterogeneity. This option can only be used with nucleotide substitution models.

-c <CODON_POSITION_RATES>

Where <CODON_POSITION_RATES> is three decimal numbers that specify the relative rates of substitution at each codon position, separated by commas or spaces.

Gamma Rate Heterogeneity

Using this option the user may specify a shape for the gamma rate heterogeneity called alpha. The default is no site-specific rate heterogeneity.

-a <ALPHA>

Where <ALPHA> is a real number >0 that specifies the shape of the gamma distribution to use with gamma rate heterogeneity. If this is used with the -g option, below, then a discrete model is used, otherwise a continuous one.

Discrete Gamma Rate Heterogeneity

Using this option the user may specify the number of categories for the discrete gamma rate heterogeneity model. The default is no site-specific rate heterogeneity (or the continuous model if only the -a option is specified).

-g <NUM_CATEGORIES>

Where <NUM_CATEGORIES> is an integer number between 2 and 32 that specifies the number of categories to use with the discrete gamma rate heterogeneity model.

Proportion of Invariable Sites

Using this option the user may specify the proportion of sites that should be invariable. These sites will be chosen randomly with this expected frequency. The default is no invariable sites. Invariable sites are sites that cannot change as opposed to sites which don't exhibit any changes due to chance (and perhaps a low rate).

-i <PROPORTION_INVARIABLE>

Where <PROPORTION_INVARIABLE> is an real number >= 0.0 and <1.0 that specifies the proportion of invariable sites.

Relative State Frequencies

This option is used to specify the equilibrium frequencies of the four nucleotides or twenty amino acids. If simulating nucleotides, the default (when no frequencies are specified) will be that all frequencies are equal. When simulating amino acids the default frequencies will be set to the empirical values associated with the specified substitution model (with the exception of the GENERAL model which has a default of equal frequencies).

-f <STATE_FREQUENCIES>

Where <STATE_FREQUENCIES> are either four decimal numbers for the frequencies of the nucleotides A, C, G and T or 20 numbers for the amino acid frequencies (in the order ARNDCQEGHILKMFPSTWYV) separated by spaces or commas.

-fe

This results in all the frequencies being set equal. This can be used to force equal frequencies for empirical amino acid models.

Transition Transversion Ratio

This option allows the user to set a value for the transition transversion ratio (TS/TV). This is only valid when either the HKY or F84 model has been selected. The default is to have a TS/TV of 0.5 which gives equal instantaneous rates of transitions and transversions. Thus omitting the -t option with the -mHKY option results in the F81 model (or the JC69 if the -f option is also omitted).

-t <TRANSITION_TRANSVERSION_RATIO>

Where <TRANSITION_TRANSVERSION_RATIO> is a decimal number greater than zero.

General Reversable Rate Matrix

This option allows the user to set values for the relative rate of substitutions between nucleotide or amino acid states. This is only valid when either the (nucleotides) or (amino acids) model has been selected.

-r <RATE_MATRIX_VALUES>

Where <RATE_MATRIX_VALUES> are decimal numbers for the relative rates of substitutions from (for nucleotides) A to C, A to G, A to T, C to G, C to T and G to T respectively, separated by spaces or commas. For amino acids there are 190 rate required representing the upper (off-diagonal) triangle of a 20x20 matrix with amino acids in the order ARNDCQEGHILKMFPSTWYV. The matrix is symmetrical so the reverse transitions equal the ones set (e.g. C to A equals A to C) and therefore only six values need be set.

Ancestral Sequence

This option allows the user to use a supplied sequence as the ancestral sequence at the root (otherwise a random sequence is used).

-k <ANCESTRAL_SEQUENCE_NUMBER>

Where <ANCESTRAL_SEQUENCE_NUMBER> is an integer number greater than zero which refers to one of the sequences supplied as input (see 'Input File Format', above).

Random Number Seed

This option allows the user to specify a seed for the random number generator. Using the same seed (with the same input) will result in identical simulated datasets. This is useful because you can then delete the (often large) simulated sequence files to save disk space. To recreate a set of simulations, you must use exactly the same model options. The default is to obtain a seed from the system clock which will be displayed on the screen allowing it noted down.

-z <RANDOM_NUMBER_SEED>

Where <RANDOM_NUMBER_SEED> is an integer number.

Output file format

This option selects the format of the output file. The default is PHYLIP format.

-op

PHYLIP format.

-or

Relaxed PHYLIP format: PHYLIP format expects exactly 10 characters for the name (padded with spaces if the name is actually less than 10). With this option the output file can have up to 256 characters in the name, followed by a single space before the sequence. The longer taxon names are read from the tree. Some programs can read this and it keeps long taxon names.

-on

NEXUS format: This creates a NEXUS file which will load into PAUP. It generates one DATA block per dataset. It also includes the simulation settings as comments which will be ignored by PAUP.

Insert Text File into Output

This option allows the user to specify text file which will be inserted into the output file after every dataset. This allows the user to insert a PAUP command block or a tree (or anything else) into the file to automate the analysis.

-x <TEXT_FILE_NAME>

Where <TEXT_FILE_NAME> is the name of a file. For Macintosh users this file must be in the same folder as the Seq-Gen program (I find it convenient to copy the Seq-Gen program and move it into the folder in which my datafile are and then delete it afterwards). For UNIX users, this can be specified with a path or should be in the current directory (the one you were in when you run Seq-Gen). This option plus NEXUS format (-on option) replaces the previously included separate program, Phy2Nex.

Write Ancestral Sequences

This option allows the user to obtain the sequences for each of the internal nodes in the tree. The sequences are written out along with the sequences for the tips of the tree in relaxed PHYLIP format (see above).

-wa

Write Site Rates

This option allows the user to obtain the relative rate of substitution for each site as used in each simulation. This will go to stderr (or the screen) and will be produced for each replicate simulation.

-wr

Minimum Information

This option prevents any output except the final trees and any error messages.

-q

Help

This option prints a help message describing the options and then quits.

-h

An example of performing simulations using Seq-Gen

An example phylogeny is included with this package (called 'example.tree'). This is an unrooted tree in PHYLIP format (see 'Input file format', above). To use this tree to simulate 3 sets of sequences 50 nucleotide long using the HKY model, a transition-transversion ratio of 3.0 and unequal base frequencies, type the following:

seq-gen -mHKY -t3.0 -f0.3,0.2,0.2,0.3 -l40 -n3 < example.tree > example.dat

This produces a PHYLIP format data file called 'example.dat'; which looks something like this:

4 50
Taxon1 ATCTTTGTAGTCATCGCCGTATTAGCATTCTTAGATCTAA
Taxon2 ATCCTAGTAGTCGCTTGCGCACTAGCCTTCCGAAATCTAG
Taxon3 ACTTCTGTGTTTACTGAGCTACTAGCTTCCCTAAATCTAG
Taxon4 ATTCCTATATTCGCTAATTTCTTAGCTTTCCTGAATCTGG
4 50
Taxon1 AGAACACAAGTCCCAAATAACCGAACTGGAGCGGGCAGTT
Taxon2 AAGACACAGGCCTAAACTGAGCGTACTGGAACGAGTCGTT
Taxon3 AAGACACAGGTCTCACTTGACTGCGTTGAAACGGTCCGTT
Taxon4 AAGACCCAGACTGTAACTTGTCGGACTGGAATGGACCGTT
4 50
Taxon1 CAGCTGAGGCATTACGAAGCGCCCGGCCGGCCGGACGATT
Taxon2 TAACTGAGACAGTACGAAACGCGCAATGGGCCCCAAAACC
Taxon3 CGGTTAGGACATGACGAATCGCCCAGCGGGCCTCCGGACC
Taxon4 CAACTGGAATGTTACCAGCTGCCTAGGGTGCTCCGAGGAC

References

Adachi, J., and Hasegawa, M. (1996) Model of amino acid substitution in proteins encoded by mitochondrial DNA. J. Mol. Evol. 42: 459-468.

Adachi, J., Waddell, P.J., Martin, W., and Hasegawa, M. (2000) Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J. Mol. Evol. 50, 348358.

Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C. (1978) A model of evolutionary change in proteins. In Dayhoff, M.O. (ed.) Atlas of Protein Sequence Structur., Vol5, Suppl. 3, National Biomedical Research Foundation, Washington DC, pp. 345-352.

Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol., 17, 368-376.

Felsenstein, J. (1993) Phylogeny Inference Package (PHYLIP), Version 3.5. Department of Genetics, University of Washington, Seattle.

Felsenstein, J. and Churchill, G. (1996) A hidden markov model approach to variation among sites in rate of evolution. Molecular Biology and Evolution, 13, 93-104.

Hasegawa, M., Kishino, H. and Yano, T. (1985) Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol., 22, 160-174.

Henikoff, S., and Henikoff, J. G. (1992) ???. PNAS USA 89:10915-10919.

Jones, D.T., Taylor, W.R. and Thornton, J.M. (1992) The rapid generation of mutation data matrices from protein sequences. CABIOS. 8: 275-282

Jukes, T. H. and Cantor, C. R. (1969) Evolution of protein molecules. In Munro, H. N. (ed.) Mammalian Protein Metabolism, Academic Press, New York, pp. 21-123.

Kimura, M. (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol., 16, 111-120.

Swofford, D. L. (1993) Phylogenetic analysis using parsimony (PAUP), Version 3.1.1. Illinois Natural History Survey, Champaign.

Thorne, J. L., Kishino, H. and Felsenstein, J. (1992) Inching toward reality: An improved likelihood model of sequence evolution. J. Mol. Evol., 34, 3-16.

Whelan, S. and Goldman, N. (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691699.

Yang, Z. (1993) Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol., 10, 1396-1401.

Yang, Z. (1994) Estimating the pattern of nucleotide substitution. J. Mol. Evol., 39, 105-111.

Yang, Z. (1996) Among-site rate variation and its impact on phylogenetic analyses. Tr. Ecol. Evol., 11, 367-372.

Version History.

Bug fixed in version 1.3.2 - 7 Jan 2005

Bug fixed in version 1.3.1 - 4 Nov 2004

New Features in version 1.3 - 30 Aug 2004

New Features in version 1.2.7 - 19 Nov 2003

New Features in version 1.2.6 - 4 Dec 2002

New Features in version 1.2.5 - 25 Sep 2001

New Features in version 1.2.4 - 6 Jul 2001

Bug fixed which resulted in missing 'Begin Data' in NEXUS files when creating a single set of sequences.

New Features in version 1.2.3 - 6 Apr 2001

Bug fixed in Macintosh which would result in some of the end of a long command line being ignored.

Bug fixed in Version 1.22 - 4 Feb 2001

Fixed a bug which prevented unrooted trees from loading (complained about polytomies in the tree).

Bug fixed in version 1.21

Fixed a bug which prevented single partitions being simulated (i.e. most people's simulations). Updated make file in UNIX version.

New Features in version 1.2

Bug fixed in version 1.1

WARNING: a very important bug has been fixed in this release. Many apologies to anyone to whom this is relevent. All versions of Seq-Gen prior to this have not reassigned the gamma rate categories for each site between replicate simulations. This means that the same site will have the same rate (in both the discrete and continuous model) between replicates. This will reduce the amount of variability in a set of simulations.

Seq-Gen-1.3.4/documentation/Using_Seq-Gen_with_PAUP000077500000000000000000000007151315746145500220570ustar00rootroot00000000000000Seq-Gen by Andrew Rambaut and Nick Grassly Equivalent commands in PAUP and Seq-Gen: PAUP Seq-Gen =================================================================================== lset nst=2 variant=hky tratio=2.0; -mhky -t2.0 lset nst=6 rmat=(1.0 2.0 3.0 4.0 5.0); -mrev -r1.0,2.0,3.0,4.0,5.0,1.0 charpartition codon=1st:1-.\3,2nd:2-.\3,3rd:3-.\3; -c0.3,0.2,2.5 charpartition genes=RNA:1-300,CDS:301-1000; -p3 (plus rates in treefile)Seq-Gen-1.3.4/documentation/Version_History000077500000000000000000000101201315746145500207300ustar00rootroot00000000000000Seq-Gen by Andrew Rambaut and Nick Grassly Version History. Version 1.3.2 - 7 January 2005 The PAM (Dayhoff) and Blosum matrices were interchanged so specifying one would result in the other being used instead. Version 1.3.1 - 4 November 2004 Fixed a problem where specified nucleotide frequencies were being ignored and equal frequencies were used. Version 1.3 - 30 August 2004 Added amino acid simulation to Seq-Gen. This replaces PSeq-Gen which was not being updated but also adds a number of other amino acid models. Removed the limit on tree size. The only limit now is the available memory. Updated to the latest version of the MT19937 random number generator. Version 1.27 - 19 November 2003 Replaced the random number generator with the high quality Mersenne Twister. When Seq-Gen was originally written, the computation of random numbers was a significant burden but with the speed of current computers this is no longer an issue. Version 1.26 - 4 December 2002 Recompiled it to run natively under MacOS X. Improved resolution of the automatic seeding of the random number generator by adding some milliseconds to it. Thus runs of Seq-Gen that are less than a second apart will have different seeds. This probably only matters on UNIX machines using scripts to do multiple (short) runs. Version 1.25 - 25 September 2001 New option to write used rates for each site (-wr option). Write ancestral sequences now becomes -wa option. The improved 'interface' for the Mac version is not compiling properly so I have gone back to the old. I hope to fix this soon. Version 1.24 - 6 July 2001 Can now specify a relative rate for each partition. The partitions are specified in the tree files but all the partitions can be given the same tree but different rates. Bug fixed which resulted in missing 'Begin Data' in NEXUS files when creating a single set of sequences. Version 1.23 - 5 April 2001 Added feature write ancestral sequences (-w option). Improved the interface of the Macintosh version. Can now drag a tree onto the application. New Carbon version that will run on MacOS X and MacOS 9.0. Version 1.22 - 4 Feb 2001 Fixed a bug which prevented unrooted trees from loading (complained about polytomies in the tree). Version 1.21 - 19 Dec 2000 Fixed a bug which prevented single partitions being simulated (i.e. most people's simulations). Updated make file in UNIX version. Version 1.2 - 5 Dec 2000 New Features: Invariable Sites model. Different trees for different partitions of the data. This allows simulation of recombinant histories. Can specify the random number seed. Can write NEXUS format (replacing the need for phy2nex). Version 1.1 - 14 Dec 1998 All versions of Seq-Gen prior to this have not reassigned the gamma rate categories for each site between replicate simulations. This means that the same site will have the same rate (in both the discrete and continuous model) between replicates. This will reduce the amount of variability in a set of simulations. Version 1.06 - 29 July 1998 Added ability to load an alignment of sequences and use one of them as an ancestral sequence. Version 1.05 - Can't remember when. Fixed a small cosmetic bug in the Mac version. Version 1.04 - 14 Jan 1997 Removed the Numerical Recipe random number generater and inserted one from Yang's PAML code (from tools.c in PAML). Fixed the UNIX Makefile. Version 1.03 - 6 Dec 1996 Fixed problem with calculating Kappa from ts/tv ratio and vice-versa in the HKY model when the base frequencies are unequal. This mirrors a problem fixed in release 4d52 of Paup*. Thus using the ts/tv calculated with versions of Paup before 4d52 would give the right Kappa in versions of Seq-Gen before version 1.03. Version 1.02 - 16 Nov 1996 Added a discrete gamma rate heterogeneity function. This will be faster than the default continuous version. A maximum of 32 discrete categories are allowed. Yang (1994) estimates that 4 is a good approximation. Version 1.01 - 8 Nov 1996 Fixed problem with REV model and Gamma rate heterogeneity. Do not use previous version with this combination. Version 1.0 First distributed version. Seq-Gen-1.3.4/documentation/icon.gif000077500000000000000000000021451315746145500172660ustar00rootroot00000000000000GIF89a f3̙f3f3ffffff3f3333f333f3f3̙f3̙̙̙̙f̙3̙ffffff3f3333f333f3̙f3̙̙f3̙f3ff̙ffff3f33̙33f333̙f3ffffff3ffff̙fff3fffffff3ffffffffffff3fff3f3f3f3ff33f3ffffff3f3333f333333̙3f3333333f3333f3f3f3ff3f33f33333333f333333333f333f3̙f3f3ffffff3f3333f333f3wUD"wUD"wUD"ݻwwwUUUDDD""",  H%)\ȰÇ> ŋ-cg1M&fiUōWI#cIb\ L ,2L5,SE$:ŕGti #include #include #include #include "eigen.h" #include "aamodels.h" char *aminoAcids="ARNDCQEGHILKMFPSTWYV"; int aaFreqSet; int aaModel = AA_NONE; double aaFreq[NUM_AA]; double aaAddFreq[NUM_AA]; double aaMatrix[MAX_RATE_CATS][SQNUM_AA]; double aaVector[NUM_AA]; double aaRelativeRate[NUM_AA_REL_RATES]; static double Qij[SQNUM_AA], Cijk[CUNUM_AA], Root[NUM_AA]; void SetupAAMatrix(); void SetRelativeRates(double *inRelativeRates); void SetFrequencies(double *inFrequencies); void CheckAAFrequencies(); /* JTT model for amino acid evolution */ /* D.T. Jones, W.R. Taylor, and J.M. Thornton */ /* The rapid generation of mutation data matrices from protein sequences */ /* CABIOS vol. 8 no. 3 1992 pp. 275-282 */ static double jttRelativeRates[NUM_AA_REL_RATES] = { 0.531678, 0.557967, 0.827445, 0.574478, 0.556725, 1.066681, 1.740159, 0.219970, 0.361684, 0.310007, 0.369437, 0.469395, 0.138293, 1.959599, 3.887095, 4.582565, 0.084329, 0.139492, 2.924161, 0.451095, 0.154899, 1.019843, 3.021995, 0.318483, 1.359652, 3.210671, 0.239195, 0.372261, 6.529255, 0.431045, 0.065314, 0.710489, 1.001551, 0.650282, 1.257961, 0.235601, 0.171995, 5.549530, 0.313311, 0.768834, 0.578115, 0.773313, 4.025778, 0.491003, 0.137289, 2.529517, 0.330720, 0.073481, 0.121804, 5.057964, 2.351311, 0.027700, 0.700693, 0.164525, 0.105625, 0.521646, 7.766557, 1.272434, 1.032342, 0.115968, 0.061486, 0.282466, 0.190001, 0.032522, 0.127164, 0.589268, 0.425159, 0.057466, 0.453952, 0.315261, 0.091304, 0.053907, 0.546389, 0.724998, 0.150559, 0.164593, 0.049009, 0.409202, 0.678335, 0.123653, 2.155331, 0.469823, 1.104181, 2.114852, 0.621323, 3.417706, 0.231294, 5.684080, 0.078270, 0.709004, 2.966732, 0.456901, 0.045683, 1.608126, 0.548807, 0.523825, 0.172206, 0.254745, 0.179771, 1.115632, 0.243768, 0.111773, 0.097485, 1.731684, 0.175084, 0.043829, 0.191994, 0.312449, 0.331584, 0.114381, 0.063452, 0.465271, 0.201696, 0.053769, 0.069492, 0.269840, 0.130379, 0.050212, 0.208081, 1.874296, 0.316862, 0.544180, 0.052500, 0.470140, 0.181788, 0.540571, 0.525096, 0.329660, 0.453428, 1.141961, 0.743458, 0.477355, 0.128193, 5.848400, 0.121827, 2.335139, 0.202562, 4.831666, 0.777090, 0.098580, 0.405119, 2.553806, 0.134510, 0.303445, 9.533943, 0.146481, 3.856906, 2.500294, 1.060504, 0.592511, 0.272514, 0.530324, 0.241094, 1.761439, 0.624581, 0.024521, 0.216345, 0.474478, 0.965641, 0.089134, 0.087904, 0.124066, 0.436181, 0.164215, 0.285564, 2.114728, 0.201334, 0.189870, 3.038533, 0.148483, 0.943971, 0.138904, 0.537922, 5.484236, 0.593478, 2.788406, 1.176961, 0.069965, 0.113850, 0.211561, 4.777647, 0.310927, 0.628608, 0.408532, 0.080556, 0.201094, 1.14398, 0.747889, 0.239697, 0.165473 }; static double jttFrequencies[NUM_AA] = { 0.076862, 0.051057, 0.042546, 0.051269, 0.020279, 0.041061, 0.061820, 0.074714, 0.022983, 0.052569, 0.091111, 0.059498, 0.023414, 0.040530, 0.050532, 0.068225, 0.058518, 0.014336, 0.032303, 0.066374 }; /* WAG model of amino acid evolution */ /* Whelan, S. and Goldman, N. (2001) A general empirical model of protein */ /* evolution derived from multiple protein families using a maximum-likelihood */ /* approach. Mol. Biol. Evol. 18, 691699. */ static double wagRelativeRates[NUM_AA_REL_RATES] = { 0.610810, 0.569079, 0.821500, 1.141050, 1.011980, 1.756410, 1.572160, 0.354813, 0.219023, 0.443935, 1.005440, 0.989475, 0.233492, 1.594890, 3.733380, 2.349220, 0.125227, 0.268987, 2.221870, 0.711690, 0.165074, 0.585809, 3.360330, 0.488649, 0.650469, 2.362040, 0.206722, 0.551450, 5.925170, 0.758446, 0.116821, 0.753467, 1.357640, 0.613776, 1.294610, 0.423612, 0.280336, 6.013660, 0.296524, 1.716740, 1.056790, 1.253910, 4.378930, 0.615636, 0.147156, 3.334390, 0.224747, 0.110793, 0.217538, 4.394450, 2.257930, 0.078463, 1.208560, 0.221176, 0.033379, 0.691268, 6.833400, 0.961142, 1.032910, 0.043523, 0.093930, 0.533362, 0.116813, 0.052004, 0.472601, 1.192810, 0.417372, 0.146348, 0.363243, 0.169417, 0.109261, 0.023920, 0.341086, 0.275403, 0.189890, 0.428414, 0.083649, 0.437393, 0.441300, 0.122303, 1.560590, 0.570186, 0.795736, 0.604634, 1.114570, 6.048790, 0.366510, 4.749460, 0.131046, 0.964886, 4.308310, 1.705070, 0.110744, 1.036370, 1.141210, 0.954144, 0.243615, 0.252457, 0.333890, 0.630832, 0.635025, 0.141320, 0.172579, 2.867580, 0.353912, 0.092310, 0.755791, 0.782467, 0.914814, 0.172682, 0.217549, 0.655045, 0.276379, 0.034151, 0.068651, 0.415992, 0.194220, 0.055288, 0.273149, 1.486700, 0.251477, 0.374321, 0.114187, 0.209108, 0.152215, 0.555096, 0.992083, 0.450867, 0.756080, 0.771387, 0.822459, 0.525511, 0.289998, 4.290350, 0.131869, 3.517820, 0.360574, 4.714220, 1.177640, 0.111502, 0.353443, 1.615050, 0.234326, 0.468951, 8.659740, 0.287583, 5.375250, 2.348200, 0.462018, 0.382421, 0.364222, 0.740259, 0.443205, 1.997370, 1.032220, 0.098843, 0.619503, 1.073780, 1.537920, 0.152232, 0.147411, 0.342012, 1.320870, 0.194864, 0.556353, 1.681970, 0.570369, 0.473810, 2.282020, 0.179896, 0.606814, 0.191467, 1.699780, 7.154480, 0.725096, 1.786490, 0.885349, 0.156619, 0.239607, 0.351250, 4.847130, 0.578784, 0.872519, 0.258861, 0.126678, 0.325490, 1.547670, 2.763540, 0.409817, 0.347826 }; static double wagFrequencies[NUM_AA] = { 0.0866, 0.0440, 0.0391, 0.0570, 0.0193, 0.0367, 0.0581, 0.0833, 0.0244, 0.0485, 0.0862, 0.0620, 0.0195, 0.0384, 0.0458, 0.0695, 0.0610, 0.0144, 0.0353, 0.0709 }; /* LG model of amino acid evolution */ /* Le, S. and Gascuel, O. (2008) An improved Amino Acid Replacement Matrix */ /* Mol. Biol. Evol. 25(7):1307-1320. */ static double lgRelativeRates[NUM_AA_REL_RATES] = { 0.425093, 0.276818, 0.395144, 2.489084, 0.969894, 1.038545, 2.066040, 0.358858, 0.149830, 0.395337, 0.536518, 1.124035, 0.253701, 1.177651, 4.727182, 2.139501, 0.180717, 0.218959, 2.547870, 0.751878, 0.123954, 0.534551, 2.807908, 0.363970, 0.390192, 2.426601, 0.126991, 0.301848, 6.326067, 0.484133, 0.052722, 0.332533, 0.858151, 0.578987, 0.593607, 0.314440, 0.170887, 5.076149, 0.528768, 1.695752, 0.541712, 1.437645, 4.509238, 0.191503, 0.068427, 2.145078, 0.371004, 0.089525, 0.161787, 4.008358, 2.000679, 0.045376, 0.612025, 0.083688, 0.062556, 0.523386, 5.243870, 0.844926, 0.927114, 0.010690, 0.015076, 0.282959, 0.025548, 0.017416, 0.394456, 1.240275, 0.425860, 0.029890, 0.135107, 0.037967, 0.084808, 0.003499, 0.569265, 0.640543, 0.320627, 0.594007, 0.013266, 0.893680, 1.105251, 0.075382, 2.784478, 1.143480, 0.670128, 1.165532, 1.959291, 4.128591, 0.267959, 4.813505, 0.072854, 0.582457, 3.234294, 1.672569, 0.035855, 0.624294, 1.223828, 1.080136, 0.236199, 0.257336, 0.210332, 0.348847, 0.423881, 0.044265, 0.069673, 1.807177, 0.173735, 0.018811, 0.419409, 0.611973, 0.604545, 0.077852, 0.120037, 0.245034, 0.311484, 0.008705, 0.044261, 0.296636, 0.139538, 0.089586, 0.196961, 1.739990, 0.129836, 0.268491, 0.054679, 0.076701, 0.108882, 0.366317, 0.697264, 0.442472, 0.682139, 0.508851, 0.990012, 0.584262, 0.597054, 5.306834, 0.119013, 4.145067, 0.159069, 4.273607, 1.112727, 0.078281, 0.064105, 1.033739, 0.111660, 0.232523, 10.649107, 0.137500, 6.312358, 2.592692, 0.249060, 0.182287, 0.302936, 0.619632, 0.299648, 1.702745, 0.656604, 0.023918, 0.390322, 0.748683, 1.136863, 0.049906, 0.131932, 0.185202, 1.798853, 0.099849, 0.346960, 2.020366, 0.696175, 0.481306, 1.898718, 0.094464, 0.361819, 0.165001, 2.457121, 7.803902, 0.654683, 1.338132, 0.571468, 0.095131, 0.089613, 0.296501, 6.472279, 0.248862, 0.400547, 0.098369, 0.140825, 0.245841, 2.188158, 3.151815, 0.189510, 0.249313 }; static double lgFrequencies[NUM_AA] = { 0.079066, 0.055941, 0.041977, 0.053052, 0.012937, 0.040767, 0.071586, 0.057337, 0.022355, 0.062157, 0.099081, 0.064600, 0.022951, 0.042302, 0.044040, 0.061197, 0.053287, 0.012066, 0.034155, 0.069147 }; /* Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C. (1978)*/ /* A model of evolutionary change in proteins.*/ /* Dayhoff, M.O. (ed.) Atlas of Protein Sequence Structur., Vol5, Suppl. 3,*/ /* National Biomedical Research Foundation, Washington DC, pp. 345-352.*/ static double dayhoffRelativeRates[NUM_AA_REL_RATES] = { 0.267828, 0.984474, 1.199805, 0.360016, 0.887753, 1.961167, 2.386111, 0.228116, 0.653416, 0.406431, 0.258635, 0.717840, 0.183641, 2.485920, 4.051870, 3.680365, 0.000000, 0.244139, 2.059564, 0.327059, 0.000000, 0.232374, 2.439939, 0.000000, 0.087791, 2.383148, 0.632629, 0.154924, 4.610124, 0.896321, 0.136906, 1.028313, 1.531590, 0.265745, 2.001375, 0.078012, 0.240368, 8.931515, 0.000000, 1.028509, 1.493409, 1.385352, 5.290024, 0.768024, 0.341113, 3.148371, 0.000000, 0.138503, 0.419244, 4.885892, 2.271697, 0.224968, 0.946940, 0.158067, 0.00000, 1.348551, 11.388659, 1.240981, 0.868241, 0.239248, 0.000000, 0.716913, 0.000000, 0.000000, 0.133940, 0.956097, 0.660930, 0.000000, 0.000000, 0.178316, 0.000000, 0.000000, 0.107278, 0.282729, 0.438074, 0.000000, 0.000000, 0.000000, 0.000000, 0.187550, 1.598356, 0.162366, 0.000000, 0.953164, 0.484678, 7.086022, 0.281581, 6.011613, 0.180393, 0.730772, 1.519078, 1.127499, 0.000000, 1.526188, 0.561828, 0.525651, 0.000000, 0.000000, 0.346983, 0.811907, 0.439469, 0.609526, 0.112880, 0.830078, 0.304803, 0.000000, 0.507003, 0.793999, 0.340156, 0.000000, 0.214717, 0.36725, 0.106802, 0.000000, 0.071514, 0.267683, 0.170372, 0.153478, 0.347153, 2.322243, 0.306662, 0.000000, 0.000000, 0.538165, 0.076981, 0.443504, 0.270475, 0.000000, 0.475927, 0.933709, 0.353643, 0.226333, 0.270564, 1.265400, 0.438715, 2.556685, 0.460857, 3.332732, 1.951951, 0.119152, 0.247955, 1.900739, 0.000000, 0.374834, 8.810038, 0.180629, 5.230115, 1.565160, 0.316258, 0.171432, 0.331090, 0.461776, 0.286572, 1.745156, 2.411739, 0.000000, 0.335419, 0.954557, 1.350599, 0.000000, 0.132142, 0.10385, 0.921860, 0.170205, 0.619951, 1.031534, 0.000000, 0.000000, 2.565955, 0.110506, 0.459901, 0.136655, 0.762354, 6.952629, 0.123606, 2.427202, 0.782857, 0.000000, 0.000000, 0.485026, 5.436674, 0.740819, 0.336289, 0.303836, 0.000000, 0.417839, 1.561997, 0.608070, 0.000000, 0.279379 }; static double dayhoffFrequencies[NUM_AA] = { 0.087127, 0.040904, 0.040432, 0.046872, 0.033474, 0.038255, 0.049530, 0.088612, 0.033619, 0.036886, 0.085357, 0.080481, 0.014753, 0.039772, 0.050680, 0.069577, 0.058542, 0.010494, 0.029916, 0.064718 }; /* BLOSUM62 model of amino acid evolution */ /* Henikoff, S., and J. G. Henikoff. 1992. PNAS USA 89:10915-10919.*/ static double blosumRelativeRates[NUM_AA_REL_RATES] = { 7.3579038969751e-01, 4.8539105546575e-01, 5.4316182089867e-01, 1.4599953104700e+00, 1.1997057046020e+00, 1.1709490427999e+00, 1.9558835749595e+00, 7.1624144499779e-01, 6.0589900368677e-01, 8.0001653051838e-01, 1.2952012667833e+00, 1.2537582666635e+00, 4.9296467974759e-01, 1.1732759009239e+00, 4.3250926870566e+00, 1.7291780194850e+00, 4.6583936772479e-01, 7.1820669758623e-01, 2.1877745220045e+00, 1.2974467051337e+00, 5.0096440855513e-01, 2.2782657420895e-01, 3.0208336100636e+00, 1.3605741904203e+00, 4.1876330851753e-01, 1.4561411663360e+00, 2.3203644514174e-01, 6.2271166969249e-01, 5.4111151414889e+00, 9.8369298745695e-01, 3.7164469320875e-01, 4.4813366171831e-01, 1.1227831042096e+00, 9.1466595456337e-01, 4.2638231012175e-01, 7.2051744121611e-01, 4.3838834377202e-01, 3.1801000482161e+00, 3.9735894989702e-01, 1.8392161469920e+00, 1.2404885086396e+00, 1.3558723444845e+00, 2.4145014342081e+00, 2.8301732627800e-01, 2.1188815961519e-01, 1.5931370434574e+00, 6.4844127878707e-01, 3.5486124922252e-01, 4.9488704370192e-01, 2.9041016564560e+00, 1.8981736345332e+00, 1.9148204624678e-01, 5.3822251903674e-01, 3.1285879799342e-01, 2.4083661480204e-01, 1.1909457033960e+00, 3.7616252083685e+00, 7.9847324896839e-01, 7.7814266402188e-01, 4.1855573246161e-01, 2.1813157759360e-01, 1.0324479249521e+00, 2.2262189795786e-01, 2.8173069420651e-01, 7.3062827299842e-01, 1.5827541420653e+00, 9.3418750943056e-01, 1.4534504627853e-01, 2.6142220896504e-01, 2.5812928941763e-01, 3.2980150463028e-01, 1.4074889181440e-01, 4.1820319228376e-01, 3.5405810983129e-01, 7.7489402279418e-01, 8.3184264014158e-01, 2.8507880090648e-01, 7.6768882347954e-01, 4.4133747118660e-01, 3.5600849876863e-01, 1.1971884150942e+00, 1.1198313585160e+00, 5.2766441887169e-01, 4.7023773369610e-01, 1.1163524786062e+00, 5.5289191779282e+00, 6.0984630538281e-01, 2.4353411311401e+00, 2.3620245120365e-01, 5.8073709318144e-01, 3.9452776745146e+00, 2.4948960771127e+00, 1.4435695975031e-01, 8.5857057567418e-01, 1.9348709245965e+00, 1.2774802945956e+00, 7.5865380864172e-01, 9.5898974285014e-01, 5.3078579012486e-01, 4.2357999217628e-01, 1.6268910569817e+00, 1.8684804693170e-01, 3.7262517508685e-01, 2.8024271516787e+00, 5.5541539747043e-01, 2.9140908416530e-01, 9.2656393484598e-01, 1.7698932389373e+00, 1.0710972360073e+00, 4.0763564893830e-01, 5.9671930034577e-01, 5.2425384633796e-01, 5.3985912495418e-01, 1.8929629237636e-01, 2.1772115923623e-01, 7.5204244030271e-01, 4.5943617357855e-01, 3.6816646445253e-01, 5.0408659952683e-01, 1.5093262532236e+00, 6.4143601140497e-01, 5.0835892463812e-01, 3.0805573703500e-01, 2.5334079019018e-01, 2.5271844788492e-01, 3.4807220979697e-01, 1.0225070358890e+00, 9.8431152535870e-01, 7.1453370392764e-01, 5.2700733915060e-01, 1.1170297629105e+00, 5.8540709022472e-01, 3.0124860078016e-01, 4.2189539693890e+00, 2.0155597175031e-01, 3.8909637733035e+00, 4.0619358664202e-01, 3.3647977631042e+00, 1.5173593259539e+00, 3.8835540920564e-01, 3.5754441245967e-01, 1.1790911972601e+00, 3.4198578754023e-01, 6.7461709322842e-01, 8.3118394054582e+00, 4.4557027426059e-01, 6.0305593795716e+00, 2.0648397032375e+00, 3.7455568747097e-01, 3.5296918452729e-01, 9.1525985769421e-01, 6.9147463459998e-01, 8.1124585632307e-01, 2.2314056889131e+00, 1.0730611843319e+00, 2.6692475051102e-01, 1.0473834507215e+00, 1.7521659178195e+00, 1.3038752007987e+00, 3.3224304063396e-01, 7.1799348690032e-01, 4.9813847530407e-01, 1.7738551688305e+00, 4.5412362510273e-01, 9.1872341574605e-01, 1.4885480537218e+00, 8.8810109815193e-01, 9.5168216224591e-01, 2.5758507553153e+00, 2.3359790962888e-01, 5.4002764482413e-01, 4.8820611879305e-01, 2.0743248934965e+00, 6.7472604308008e+00, 8.3811961017754e-01, 1.1691295777157e+00, 1.0054516831488e+00, 2.5221483002727e-01, 3.6940531935451e-01, 4.9690841067567e-01, 5.1515562922704e+00, 3.8792562209837e-01, 7.9675152076106e-01, 5.6192545744165e-01, 5.1312812689059e-01, 8.0101024319939e-01, 2.2530740511763e+00, 4.0544190065580e+00, 2.6650873142646e-01, 1.0000000000000e+00 }; static double blosumFrequencies[NUM_AA] = { 0.074, 0.052, 0.045, 0.054, 0.025, 0.034, 0.054, 0.074, 0.026, 0.068, 0.099, 0.058, 0.025, 0.047, 0.039, 0.057, 0.051, 0.013, 0.032, 0.073 }; /* mtRev model - complete sequence data of mtDNA from 24 vertebrate species */ /* Adachi, J., and Hasegawa, M. 1996. J. Mol. Evol. 42:459-468. */ static double mtrevRelativeRates[NUM_AA_REL_RATES] = { 1.2199217606346e+01, 1.4182139942122e+01, 9.2985091873208e+00, 3.1542792981957e+01, 1.0025852846688e+00, 5.1418866803338e+00, 6.3531246495131e+01, 7.3137132861715e+00, 5.0782382656186e+01, 1.3399741808481e+01, 4.4021672780560e+00, 7.4673480520104e+01, 3.3513021631978e+00, 2.8582502221773e+01, 2.0413623195312e+02, 2.5301305153906e+02, 1.0000000000000e+00, 3.4084158197615e+00, 1.0266468401249e+02, 6.9661274444534e+00, 1.0000000000000e+00, 5.4384584796568e+01, 1.1631134513343e+02, 1.0000000000000e+00, 1.2122831341194e+01, 8.6961067087353e+01, 1.0000000000000e+00, 8.1976829394538e+00, 7.4423215395318e+01, 1.0000000000000e+00, 2.4659158338099e+00, 1.2439947713615e+01, 3.1791814866372e+00, 1.0935327216119e+00, 1.1550775790126e+01, 1.0000000000000e+00, 4.0211417480338e+00, 4.1809325468160e+02, 3.1020979842967e+01, 9.1349622725361e+01, 3.3185663516310e+01, 2.8052324651124e+01, 2.6112087577885e+02, 1.4261453863336e+01, 7.9775653461977e+00, 3.2036829276162e+02, 3.4424354918739e+01, 7.9996445145608e+00, 3.8586541461044e+01, 2.6020426225852e+02, 1.2550758780474e+02, 5.6207759736659e+00, 1.0071406219571e+02, 1.0000000000000e+00, 1.0000000000000e+00, 2.9097352675564e+01, 3.0713149855302e+02, 2.9877072751897e+01, 5.9995408885817e+01, 2.2827096245105e+00, 1.0000000000000e+00, 1.2183938185384e+00, 1.0000000000000e+00, 2.6221929413096e+00, 7.0708004204733e+00, 3.6327934317139e+01, 1.4743408713748e+01, 1.0453246057102e+01, 1.1165627147496e+01, 1.0000000000000e+00, 3.9599394038972e+01, 1.0000000000000e+00, 1.6163581056674e+01, 7.4467985406234e+01, 3.3018175376623e+01, 1.3500725995091e+01, 1.0000000000000e+00, 3.2504095376923e+00, 3.7264767083096e+01, 1.6454136037822e+01, 1.4581783243113e+02, 9.4720031458442e+01, 1.7684087896962e+01, 1.3409157685926e+02, 1.0000000000000e+00, 1.6503249008836e+02, 3.5530760735494e+00, 3.0652523140859e+02, 4.3905393139325e+00, 2.0895470525345e+01, 2.4504076430724e+02, 2.4931300477797e+01, 1.0059428264289e+01, 7.2256314165467e+01, 2.8480937892158e+01, 4.9962974409828e+01, 1.0000000000000e+00, 2.0430790980529e+01, 9.9986289000676e+00, 1.4884496769963e+01, 2.5853576435567e+01, 1.7418201388328e+00, 1.0000000000000e+00, 1.6519126809071e+02, 1.0000000000000e+00, 1.4067850525292e+00, 6.7547121641947e+00, 2.8794794140840e+01, 7.8001372062558e+00, 1.0000000000000e+00, 6.9067239183061e+00, 1.1127702362585e+01, 1.0000000000000e+00, 3.1466649021550e+00, 1.2699794194865e+00, 1.1962111069278e+01, 1.0000000000000e+00, 1.0000000000000e+00, 1.0000000000000e+00, 6.6277950574411e+01, 5.8800079133028e+00, 5.7494182626674e+00, 1.6887657206208e+00, 1.3320553471351e+00, 6.4536986087271e+00, 6.0472584534958e+00, 6.7197196398961e+01, 6.2977633277779e+00, 2.5347805183364e+01, 3.2089868698728e+01, 4.0766987134407e+01, 2.3570850628539e+01, 3.7286635325194e+00, 3.5270764890474e+02, 1.0000000000000e+00, 1.7320653206333e+02, 1.0298655619743e+01, 2.7262244199514e+02, 4.4561065036310e+01, 1.0856482766156e+01, 2.5107659603898e+01, 1.9391167162525e+02, 1.0000000000000e+00, 1.3161329199391e+01, 6.4365086389428e+02, 7.8314019154706e+00, 2.8290920517725e+02, 1.1371735519833e+02, 2.1105885757279e+01, 3.8741359395934e+01, 6.6524559321657e+01, 1.7071378554833e+01, 2.3234516108847e+01, 4.8247261078055e+01, 4.8092094826036e+01, 3.3887559483420e+00, 2.6368577564199e+01, 5.5679895711418e+01, 7.1750284708933e+01, 1.2631893872825e+01, 2.6932728996777e+01, 1.0000000000000e+00, 4.7798798034572e+01, 9.9165053447429e+00, 5.8505442466161e+01, 2.7798190504760e+02, 1.1427000119701e+01, 2.1029990530586e+01, 2.0397078683768e+02, 9.1089574817139e+00, 3.3835737720574e+01, 1.7815549567056e+01, 4.1272404968214e+00, 2.4504156395152e+02, 3.3435675442163e+00, 8.9421193040709e+01, 6.7485067008375e+01, 2.2161693733113e+00, 8.5338209390745e+00, 4.3342126659660e+00, 3.1432036618746e+02, 2.0305343047059e+01, 3.4167877957799e+01, 1.0000000000000e+00, 5.2559565123081e+00, 2.0382362288681e+01, 1.0765527137500e+02, 1.3814733274637e+01, 2.8259139240676e+00, 1.0000000000000e+00 }; static double mtrevFrequencies[NUM_AA] = { 0.072, 0.019, 0.039, 0.019, 0.006, 0.025, 0.024, 0.056, 0.028, 0.088, 0.168, 0.023, 0.054, 0.061, 0.054, 0.072, 0.086, 0.029, 0.033, 0.043 }; /* CPREV 45 model of amino acid evolution */ /* Adachi, J., P.J. Waddell, W. Martin, and M. Hasegawa. 2000. JME 50:348-358 */ static double cprevRelativeRates[NUM_AA_REL_RATES] = { -105, -227, -175, -669, -157, -499, -665, -66, -145, -197, -236, -185, -68, -490, -2440, -1340, -14, -56, -968, -357, -43, -823, -1745, -152, -243, -715, -136, -203, -4482, -125, -53, -87, -385, -314, -230, -323, -92, -4435, -538, -768, -1055, -653, -1405, -168, -113, -2430, -61, -97, -173, -2085, -1393, -40, -754, -83, -10, -400, -3691, -431, -331, -10, -10, -412, -47, -22, -170, -590, -266, -18, -281, -75, -10, -10, -303, -441, -280, -396, -48, -159, -726, -285, -2331, -576, -435, -1466, -592, -3122, -133, -1269, -92, -286, -3313, -202, -10, -323, -396, -241, -53, -391, -54, -379, -162, -148, -82, -2629, -113, -145, -185, -568, -369, -63, -142, -200, -19, -40, -20, -263, -21, -25, -28, -691, -92, -82, -10, -91, -29, -66, -305, -10, -127, -152, -303, -32, -69, -1971, -25, -1745, -345, -1772, -454, -117, -216, -1040, -42, -89, -4797, -218, -1351, -1268, -219, -516, -156, -159, -189, -865, -193, -72, -302, -868, -918, -10, -247, -249, -327, -100, -93, -645, -86, -215, -475, -43, -487, -148, -468, -2370, -317, -1202, -260, -49, -97, -122, -2151, -73, -522, -167, -29, -71, -760, -346, -10, -119 }; static double cprevFrequencies[NUM_AA] = { 0.076, 0.062, 0.041, 0.037, 0.009, 0.038, 0.049, 0.084, 0.025, 0.081, 0.101, 0.050, 0.022, 0.051, 0.043, 0.062, 0.054, 0.018, 0.031, 0.066 }; /* MtArt model inserted in following 7 Oct 2011 -- Lars Jermiin (lars.jermiin@csiro.au) */ /* MtArt: A New Model of Amino Acid Replacement for Arthropoda */ /* Abascal, F., Posada, D., and Zardoya, R. 2007. Mol. Biol. Evol. 24:1-5. */ static double mtartRelativeRates[NUM_AA_REL_RATES] = { 0.19, 0.19, 0.59, 253.51, 0.19, 0.19, 199.84, 0.19, 25.69, 3.70, 0.19, 120.64, 13.10, 49.33, 672.97, 243.93, 0.19, 1.18, 339.91, 0.19, 4.28, 35.51, 153.97, 0.19, 0.19, 41.30, 1.81, 1.77, 208.60, 5.18, 4.73, 0.19, 2.66, 0.19, 0.19, 3.88, 0.19, 500.16, 98.18, 261.79, 183.04, 120.53, 179.54, 21.33, 12.65, 467.34, 78.81, 19.72, 16.54, 398.43, 165.89, 7.73, 251.17, 22.60, 10.61, 0.19, 861.83, 12.46, 0.19, 6.60, 1.17, 1.68, 0.19, 0.19, 0.19, 44.35, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 80.53, 12.37, 63.00, 78.71, 0.19, 312.30, 184.06, 0.19, 664.16, 182.80, 21.61, 71.99, 350.40, 261.55, 2.64, 313.50, 10.54, 16.28, 349.28, 67.33, 0.19, 39.30, 52.36, 43.68, 6.68, 86.67, 0.19, 43.86, 15.25, 6.81, 1.72, 106.31, 0.19, 0.19, 7.88, 31.50, 43.40, 10.99, 7.74, 13.63, 0.19, 2.74, 1.36, 0.19, 55.71, 0.83, 0.19, 226.03, 0.19, 1.88, 8.65, 2.56, 0.19, 5.54, 0.19, 0.19, 13.78, 0.79, 10.62, 18.59, 0.19, 191.43, 0.19, 514.54, 3.46, 514.78, 117.91, 0.19, 7.15, 203.74, 0.19, 12.33, 1854.52, 3.78, 885.50, 262.57, 12.17, 8.17, 47.75, 21.13, 19.76, 84.72, 105.61, 10.73, 16.82, 144.22, 69.54, 15.97, 117.09, 26.06, 321.61, 5.26, 111.74, 288.57, 70.68, 70.92, 281.29, 14.62, 36.05, 13.54, 53.67, 791.56, 51.90, 86.50, 46.83, 0.19, 18.39, 31.70, 660.39, 2.38, 30.45, 60.59, 0.19, 45.98, 544.14, 37.72, 0.19, 1.59 }; static double mtartFrequencies[NUM_AA] = { 0.054116, 0.018227, 0.039903, 0.020160, 0.009709, 0.018781, 0.024289, 0.068183, 0.024518, 0.092639, 0.148658, 0.021718, 0.061453, 0.088668, 0.041826, 0.091030, 0.049194, 0.029786, 0.039443, 0.057701 }; /*************************************/ void SetAAModel(int theAAModel) { aaModel = theAAModel; /* "AA_MTART" inserted in position 6 of the following switch 7 Oct 2011 -- Lars Jermiin (lars.jermiin@csiro.au) */ switch (aaModel) { case AA_JTT: SetRelativeRates(jttRelativeRates); break; case AA_WAG: SetRelativeRates(wagRelativeRates); break; case AA_DAYHOFF78: SetRelativeRates(dayhoffRelativeRates); break; case AA_BLOSUM62: SetRelativeRates(blosumRelativeRates); break; case AA_MTREV24: SetRelativeRates(mtrevRelativeRates); break; case AA_CPREV45: SetRelativeRates(cprevRelativeRates); break; case AA_MTART: SetRelativeRates(mtartRelativeRates); break; case AA_LG: SetRelativeRates(lgRelativeRates); break; case AA_GENERAL: // relative rates set by user break; } /* "AA_MTART" inserted in position 6 of the following switch 7 Oct 2011 -- Lars Jermiin (lars.jermiin@csiro.au) */ if (!aaFreqSet) { switch (aaModel) { case AA_JTT: SetFrequencies(jttFrequencies); break; case AA_WAG: SetFrequencies(wagFrequencies); break; case AA_DAYHOFF78: SetFrequencies(dayhoffFrequencies); break; case AA_BLOSUM62: SetFrequencies(blosumFrequencies); break; case AA_MTREV24: SetFrequencies(mtrevFrequencies); break; case AA_CPREV45: SetFrequencies(cprevFrequencies); break; case AA_MTART: SetFrequencies(mtartFrequencies); break; case AA_LG: SetFrequencies(lgFrequencies); break; case AA_GENERAL: // frequencies set by user break; } } else { // User set or equal frequencies CheckAAFrequencies(); } SetupAAMatrix(); } void SetRelativeRates(double *inRelativeRate) { int i; for (i=0; i maxfreq) { maxfreq = aaFreq[i]; maxi = i; } sum += aaFreq[i]; } diff = 1.0 - sum; aaFreq[maxi] += diff; for (i = 0; i < NUM_AA - 1; i++) { for (j = i+1; j < NUM_AA; j++) { if (aaFreq[i] == aaFreq[j]) { aaFreq[i] += MINFDIFF; aaFreq[j] -= MINFDIFF; } } } } void SetAAMatrix(double *matrix, double len) { int i,j,k; double expt[NUM_AA]; double *P; /* P(t)ij = SUM Cijk * exp{Root*t} */ P=matrix; if (len<1e-6) { for (i=0; i #include #include #include #include "eigen.h" /* Everything below is shamelessly taken from Yang's Paml package */ void balance(double mat[], int n, int *low, int *hi, double scale[]); void unbalance(int n, double vr[], double vi[], int low, int hi, double scale[]); int realeig(int job, double mat[], int n,int low, int hi, double valr[], double vali[], double vr[], double vi[]); void elemhess(int job, double mat[], int n, int low, int hi, double vr[], double vi[], int work[]); typedef struct { double re, im; } complex; #define csize(a) (fabs(a.re)+fabs(a.im)) complex compl (double re,double im); complex cplus (complex a, complex b); complex cminus (complex a, complex b); complex cby (complex a, complex b); complex cdiv (complex a,complex b); complex cexp (complex a); complex cfactor (complex x, double a); int cxtoy (complex x[], complex y[], int n); int cmatby (complex a[], complex b[], complex c[], int n,int m,int k); int cmatout (FILE * fout, complex x[], int n, int m); int cmatinv( complex x[], int n, int m, double space[]); int abyx (double a, double x[], int n) { int i; for (i=0; i=n */ register int i,j,k; int *irow=(int*) space; double ee=1.0e-20, t,t1,xmax; double det=1.0; for (i=0; i=0; i--) { if (irow[i] == i) continue; for (j=0; j(b)?(a):(b)) #define BASE 2 /* base of floating point arithmetic */ #define DIGITS 53 /* no. of digits to the base BASE in the fraction */ #define MAXITER 30 /* max. no. of iterations to converge */ #define pos(i,j,n) ((i)*(n)+(j)) int eigen(int job, double A[], int n, double rr[], double ri[], double vr[], double vi[], double work[]) { /* double work[n*2]: working space */ int low,hi,i,j,k, it, istate=0; double tiny=sqrt(pow((double)BASE,(double)(1-DIGITS))), t; balance(A,n,&low,&hi,work); elemhess(job,A,n,low,hi,vr,vi, (int*)(work+n)); if (-1 == realeig(job,A,n,low,hi,rr,ri,vr,vi)) return (-1); if (job) unbalance(n,vr,vi,low,hi,work); /* sort, added by Z. Yang */ for (i=0; itiny) istate=1; } return (istate) ; } /* complex funcctions */ complex compl (double re,double im) { complex r; r.re = re; r.im = im; return(r); } #define csize(a) (fabs(a.re)+fabs(a.im)) complex cplus (complex a, complex b) { complex c; c.re = a.re+b.re; c.im = a.im+b.im; return (c); } complex cminus (complex a, complex b) { complex c; c.re = a.re-b.re; c.im = a.im-b.im; return (c); } complex cby (complex a, complex b) { complex c; c.re = a.re*b.re-a.im*b.im ; c.im = a.re*b.im+a.im*b.re ; return (c); } complex cdiv (complex a,complex b) { double ratio, den; complex c; if (fabs(b.re) <= fabs(b.im)) { ratio = b.re / b.im; den = b.im * (1 + ratio * ratio); c.re = (a.re * ratio + a.im) / den; c.im = (a.im * ratio - a.re) / den; } else { ratio = b.im / b.re; den = b.re * (1 + ratio * ratio); c.re = (a.re + a.im * ratio) / den; c.im = (a.im - a.re * ratio) / den; } return(c); } complex cexp (complex a) { complex c; c.re = exp(a.re); if (fabs(a.im)==0) c.im = 0; else { c.im = c.re*sin(a.im); c.re*=cos(a.im); } return (c); } complex cfactor (complex x, double a) { complex c; c.re = a*x.re; c.im = a*x.im; return (c); } int cxtoy (complex x[], complex y[], int n) { int i; FOR (i,n) y[i]=x[i]; return (0); } int cmatby (complex a[], complex b[], complex c[], int n,int m,int k) /* a[n*m], b[m*k], c[n*k] ...... c = a*b */ { int i,j,i1; complex t; FOR (i,n) FOR(j,k) { for (i1=0,t=compl(0,0); i1=n */ int i,j,k, *irow=(int*) space; double xmaxsize, ee=1e-20; complex xmax, t,t1; FOR(i,n) { xmaxsize = 0.; for (j=i; j=0; i--) { if (irow[i] == i) continue; FOR(j,n) { t = x[j*m+i]; x[j*m+i] = x[j*m+irow[i]]; x[ j*m+irow[i]] = t; } } return (0); } void balance(double mat[], int n,int *low, int *hi, double scale[]) { /* Balance a matrix for calculation of eigenvalues and eigenvectors */ double c,f,g,r,s; int i,j,k,l,done; /* search for rows isolating an eigenvalue and push them down */ for (k = n - 1; k >= 0; k--) { for (j = k; j >= 0; j--) { for (i = 0; i <= k; i++) { if (i != j && fabs(mat[pos(j,i,n)]) != 0) break; } if (i > k) { scale[k] = j; if (j != k) { for (i = 0; i <= k; i++) { c = mat[pos(i,j,n)]; mat[pos(i,j,n)] = mat[pos(i,k,n)]; mat[pos(i,k,n)] = c; } for (i = 0; i < n; i++) { c = mat[pos(j,i,n)]; mat[pos(j,i,n)] = mat[pos(k,i,n)]; mat[pos(k,i,n)] = c; } } break; } } if (j < 0) break; } /* search for columns isolating an eigenvalue and push them left */ for (l = 0; l <= k; l++) { for (j = l; j <= k; j++) { for (i = l; i <= k; i++) { if (i != j && fabs(mat[pos(i,j,n)]) != 0) break; } if (i > k) { scale[l] = j; if (j != l) { for (i = 0; i <= k; i++) { c = mat[pos(i,j,n)]; mat[pos(i,j,n)] = mat[pos(i,l,n)]; mat[pos(i,l,n)] = c; } for (i = l; i < n; i++) { c = mat[pos(j,i,n)]; mat[pos(j,i,n)] = mat[pos(l,i,n)]; mat[pos(l,i,n)] = c; } } break; } } if (j > k) break; } *hi = k; *low = l; /* balance the submatrix in rows l through k */ for (i = l; i <= k; i++) { scale[i] = 1; } do { for (done = 1,i = l; i <= k; i++) { for (c = 0,r = 0,j = l; j <= k; j++) { if (j != i) { c += fabs(mat[pos(j,i,n)]); r += fabs(mat[pos(i,j,n)]); } } if (c != 0 && r != 0) { g = r / BASE; f = 1; s = c + r; while (c < g) { f *= BASE; c *= BASE * BASE; } g = r * BASE; while (c >= g) { f /= BASE; c /= BASE * BASE; } if ((c + r) / f < 0.95 * s) { done = 0; g = 1 / f; scale[i] *= f; for (j = l; j < n; j++) { mat[pos(i,j,n)] *= g; } for (j = 0; j <= k; j++) { mat[pos(j,i,n)] *= f; } } } } } while (!done); } /* * Transform back eigenvectors of a balanced matrix * into the eigenvectors of the original matrix */ void unbalance(int n,double vr[],double vi[], int low, int hi, double scale[]) { int i,j,k; double tmp; for (i = low; i <= hi; i++) { for (j = 0; j < n; j++) { vr[pos(i,j,n)] *= scale[i]; vi[pos(i,j,n)] *= scale[i]; } } for (i = low - 1; i >= 0; i--) { if ((k = (int)scale[i]) != i) { for (j = 0; j < n; j++) { tmp = vr[pos(i,j,n)]; vr[pos(i,j,n)] = vr[pos(k,j,n)]; vr[pos(k,j,n)] = tmp; tmp = vi[pos(i,j,n)]; vi[pos(i,j,n)] = vi[pos(k,j,n)]; vi[pos(k,j,n)] = tmp; } } } for (i = hi + 1; i < n; i++) { if ((k = (int)scale[i]) != i) { for (j = 0; j < n; j++) { tmp = vr[pos(i,j,n)]; vr[pos(i,j,n)] = vr[pos(k,j,n)]; vr[pos(k,j,n)] = tmp; tmp = vi[pos(i,j,n)]; vi[pos(i,j,n)] = vi[pos(k,j,n)]; vi[pos(k,j,n)] = tmp; } } } } /* * Reduce the submatrix in rows and columns low through hi of real matrix mat to * Hessenberg form by elementary similarity transformations */ void elemhess(int job,double mat[],int n,int low,int hi, double vr[], double vi[], int work[]) { /* work[n] */ int i,j,m; double x,y; for (m = low + 1; m < hi; m++) { for (x = 0,i = m,j = m; j <= hi; j++) { if (fabs(mat[pos(j,m-1,n)]) > fabs(x)) { x = mat[pos(j,m-1,n)]; i = j; } } if ((work[m] = i) != m) { for (j = m - 1; j < n; j++) { y = mat[pos(i,j,n)]; mat[pos(i,j,n)] = mat[pos(m,j,n)]; mat[pos(m,j,n)] = y; } for (j = 0; j <= hi; j++) { y = mat[pos(j,i,n)]; mat[pos(j,i,n)] = mat[pos(j,m,n)]; mat[pos(j,m,n)] = y; } } if (x != 0) { for (i = m + 1; i <= hi; i++) { if ((y = mat[pos(i,m-1,n)]) != 0) { y = mat[pos(i,m-1,n)] = y / x; for (j = m; j < n; j++) { mat[pos(i,j,n)] -= y * mat[pos(m,j,n)]; } for (j = 0; j <= hi; j++) { mat[pos(j,m,n)] += y * mat[pos(j,i,n)]; } } } } } if (job) { for (i=0; i low; m--) { for (i = m + 1; i <= hi; i++) { vr[pos(i,m,n)] = mat[pos(i,m-1,n)]; } if ((i = work[m]) != m) { for (j = m; j <= hi; j++) { vr[pos(m,j,n)] = vr[pos(i,j,n)]; vr[pos(i,j,n)] = 0.0; } vr[pos(i,m,n)] = 1.0; } } } } /* * Calculate eigenvalues and eigenvectors of a real upper Hessenberg matrix * Return 1 if converges successfully and 0 otherwise */ int realeig(int job,double mat[],int n,int low, int hi, double valr[], double vali[], double vr[],double vi[]) { complex v; double p=0,q=0,r=0,s=0,t,w,x,y,z=0,ra,sa,norm,eps; int niter,en,i,j,k,l,m; double precision = pow((double)BASE,(double)(1-DIGITS)); eps = precision; for (i=0; i hi) valr[i] = mat[pos(i,i,n)]; } t = 0; en = hi; while (en >= low) { niter = 0; for (;;) { /* look for single small subdiagonal element */ for (l = en; l > low; l--) { s = fabs(mat[pos(l-1,l-1,n)]) + fabs(mat[pos(l,l,n)]); if (s == 0) s = norm; if (fabs(mat[pos(l,l-1,n)]) <= eps * s) break; } /* form shift */ x = mat[pos(en,en,n)]; if (l == en) { /* one root found */ valr[en] = x + t; if (job) mat[pos(en,en,n)] = x + t; en--; break; } y = mat[pos(en-1,en-1,n)]; w = mat[pos(en,en-1,n)] * mat[pos(en-1,en,n)]; if (l == en - 1) { /* two roots found */ p = (y - x) / 2; q = p * p + w; z = sqrt(fabs(q)); x += t; if (job) { mat[pos(en,en,n)] = x; mat[pos(en-1,en-1,n)] = y + t; } if (q < 0) { /* complex pair */ valr[en-1] = x+p; vali[en-1] = z; valr[en] = x+p; vali[en] = -z; } else { /* real pair */ z = (p < 0) ? p - z : p + z; valr[en-1] = x + z; valr[en] = (z == 0) ? x + z : x - w / z; if (job) { x = mat[pos(en,en-1,n)]; s = fabs(x) + fabs(z); p = x / s; q = z / s; r = sqrt(p*p+q*q); p /= r; q /= r; for (j = en - 1; j < n; j++) { z = mat[pos(en-1,j,n)]; mat[pos(en-1,j,n)] = q * z + p * mat[pos(en,j,n)]; mat[pos(en,j,n)] = q * mat[pos(en,j,n)] - p*z; } for (i = 0; i <= en; i++) { z = mat[pos(i,en-1,n)]; mat[pos(i,en-1,n)] = q * z + p * mat[pos(i,en,n)]; mat[pos(i,en,n)] = q * mat[pos(i,en,n)] - p*z; } for (i = low; i <= hi; i++) { z = vr[pos(i,en-1,n)]; vr[pos(i,en-1,n)] = q*z + p*vr[pos(i,en,n)]; vr[pos(i,en,n)] = q*vr[pos(i,en,n)] - p*z; } } } en -= 2; break; } if (niter == MAXITER) return(-1); if (niter != 0 && niter % 10 == 0) { t += x; for (i = low; i <= en; i++) mat[pos(i,i,n)] -= x; s = fabs(mat[pos(en,en-1,n)]) + fabs(mat[pos(en-1,en-2,n)]); x = y = 0.75 * s; w = -0.4375 * s * s; } niter++; /* look for two consecutive small subdiagonal elements */ for (m = en - 2; m >= l; m--) { z = mat[pos(m,m,n)]; r = x - z; s = y - z; p = (r * s - w) / mat[pos(m+1,m,n)] + mat[pos(m,m+1,n)]; q = mat[pos(m+1,m+1,n)] - z - r - s; r = mat[pos(m+2,m+1,n)]; s = fabs(p) + fabs(q) + fabs(r); p /= s; q /= s; r /= s; if (m == l || fabs(mat[pos(m,m-1,n)]) * (fabs(q)+fabs(r)) <= eps * (fabs(mat[pos(m-1,m-1,n)]) + fabs(z) + fabs(mat[pos(m+1,m+1,n)])) * fabs(p)) break; } for (i = m + 2; i <= en; i++) mat[pos(i,i-2,n)] = 0; for (i = m + 3; i <= en; i++) mat[pos(i,i-3,n)] = 0; /* double QR step involving rows l to en and columns m to en */ for (k = m; k < en; k++) { if (k != m) { p = mat[pos(k,k-1,n)]; q = mat[pos(k+1,k-1,n)]; r = (k == en - 1) ? 0 : mat[pos(k+2,k-1,n)]; if ((x = fabs(p) + fabs(q) + fabs(r)) == 0) continue; p /= x; q /= x; r /= x; } s = sqrt(p*p+q*q+r*r); if (p < 0) s = -s; if (k != m) { mat[pos(k,k-1,n)] = -s * x; } else if (l != m) { mat[pos(k,k-1,n)] = -mat[pos(k,k-1,n)]; } p += s; x = p / s; y = q / s; z = r / s; q /= p; r /= p; /* row modification */ for (j = k; j <= (!job ? en : n-1); j++){ p = mat[pos(k,j,n)] + q * mat[pos(k+1,j,n)]; if (k != en - 1) { p += r * mat[pos(k+2,j,n)]; mat[pos(k+2,j,n)] -= p * z; } mat[pos(k+1,j,n)] -= p * y; mat[pos(k,j,n)] -= p * x; } j = min(en,k+3); /* column modification */ for (i = (!job ? l : 0); i <= j; i++) { p = x * mat[pos(i,k,n)] + y * mat[pos(i,k+1,n)]; if (k != en - 1) { p += z * mat[pos(i,k+2,n)]; mat[pos(i,k+2,n)] -= p*r; } mat[pos(i,k+1,n)] -= p*q; mat[pos(i,k,n)] -= p; } if (job) { /* accumulate transformations */ for (i = low; i <= hi; i++) { p = x * vr[pos(i,k,n)] + y * vr[pos(i,k+1,n)]; if (k != en - 1) { p += z * vr[pos(i,k+2,n)]; vr[pos(i,k+2,n)] -= p*r; } vr[pos(i,k+1,n)] -= p*q; vr[pos(i,k,n)] -= p; } } } } } if (!job) return(0); if (norm != 0) { /* back substitute to find vectors of upper triangular form */ for (en = n-1; en >= 0; en--) { p = valr[en]; if ((q = vali[en]) < 0) { /* complex vector */ m = en - 1; if (fabs(mat[pos(en,en-1,n)]) > fabs(mat[pos(en-1,en,n)])) { mat[pos(en-1,en-1,n)] = q / mat[pos(en,en-1,n)]; mat[pos(en-1,en,n)] = (p - mat[pos(en,en,n)]) / mat[pos(en,en-1,n)]; } else { v = cdiv(compl(0.0,-mat[pos(en-1,en,n)]), compl(mat[pos(en-1,en-1,n)]-p,q)); mat[pos(en-1,en-1,n)] = v.re; mat[pos(en-1,en,n)] = v.im; } mat[pos(en,en-1,n)] = 0; mat[pos(en,en,n)] = 1; for (i = en - 2; i >= 0; i--) { w = mat[pos(i,i,n)] - p; ra = 0; sa = mat[pos(i,en,n)]; for (j = m; j < en; j++) { ra += mat[pos(i,j,n)] * mat[pos(j,en-1,n)]; sa += mat[pos(i,j,n)] * mat[pos(j,en,n)]; } if (vali[i] < 0) { z = w; r = ra; s = sa; } else { m = i; if (vali[i] == 0) { v = cdiv(compl(-ra,-sa),compl(w,q)); mat[pos(i,en-1,n)] = v.re; mat[pos(i,en,n)] = v.im; } else { /* solve complex equations */ x = mat[pos(i,i+1,n)]; y = mat[pos(i+1,i,n)]; v.re = (valr[i]- p)*(valr[i]-p) + vali[i]*vali[i] - q*q; v.im = (valr[i] - p)*2*q; if ((fabs(v.re) + fabs(v.im)) == 0) { v.re = eps * norm * (fabs(w) + fabs(q) + fabs(x) + fabs(y) + fabs(z)); } v = cdiv(compl(x*r-z*ra+q*sa,x*s-z*sa-q*ra),v); mat[pos(i,en-1,n)] = v.re; mat[pos(i,en,n)] = v.im; if (fabs(x) > fabs(z) + fabs(q)) { mat[pos(i+1,en-1,n)] = (-ra - w * mat[pos(i,en-1,n)] + q * mat[pos(i,en,n)]) / x; mat[pos(i+1,en,n)] = (-sa - w * mat[pos(i,en,n)] - q * mat[pos(i,en-1,n)]) / x; } else { v = cdiv(compl(-r-y*mat[pos(i,en-1,n)], -s-y*mat[pos(i,en,n)]),compl(z,q)); mat[pos(i+1,en-1,n)] = v.re; mat[pos(i+1,en,n)] = v.im; } } } } } else if (q == 0) { /* real vector */ m = en; mat[pos(en,en,n)] = 1; for (i = en - 1; i >= 0; i--) { w = mat[pos(i,i,n)] - p; r = mat[pos(i,en,n)]; for (j = m; j < en; j++) { r += mat[pos(i,j,n)] * mat[pos(j,en,n)]; } if (vali[i] < 0) { z = w; s = r; } else { m = i; if (vali[i] == 0) { if ((t = w) == 0) t = eps * norm; mat[pos(i,en,n)] = -r / t; } else { /* solve real equations */ x = mat[pos(i,i+1,n)]; y = mat[pos(i+1,i,n)]; q = (valr[i] - p) * (valr[i] - p) + vali[i]*vali[i]; t = (x * s - z * r) / q; mat[pos(i,en,n)] = t; if (fabs(x) <= fabs(z)) { mat[pos(i+1,en,n)] = (-s - y * t) / z; } else { mat[pos(i+1,en,n)] = (-r - w * t) / x; } } } } } } /* vectors of isolated roots */ for (i = 0; i < n; i++) { if (i < low || i > hi) { for (j = i; j < n; j++) { vr[pos(i,j,n)] = mat[pos(i,j,n)]; } } } /* multiply by transformation matrix */ for (j = n-1; j >= low; j--) { m = min(j,hi); for (i = low; i <= hi; i++) { for (z = 0,k = low; k <= m; k++) { z += vr[pos(i,k,n)] * mat[pos(k,j,n)]; } vr[pos(i,j,n)] = z; } } } /* rearrange complex eigenvectors */ for (j = 0; j < n; j++) { if (vali[j] != 0) { for (i = 0; i < n; i++) { vi[pos(i,j,n)] = vr[pos(i,j+1,n)]; vr[pos(i,j+1,n)] = vr[pos(i,j,n)]; vi[pos(i,j+1,n)] = -vi[pos(i,j,n)]; } j++; } } return(0); } Seq-Gen-1.3.4/source/eigen.h000077500000000000000000000013711315746145500155360ustar00rootroot00000000000000/* Header file for eigen.c */ /* Sequence Generator - seq-gen, version 1.3.4 Copyright (c)1996-2017, Andrew Rambaut & Nick Grassly Institute of Evolutionary Biology, University of Edinburgh The code in this file is taken from Ziheng Yang's PAML package. http://abacus.gene.ucl.ac.uk/ Any feedback is very welcome. http://tree.bio.ed.ac.uk/software/seqgen/ email: a.rambaut@ed.ac.uk */ #ifndef _EIGEN_H_ #define _EIGEN_H_ int abyx (double a, double x[], int n); int xtoy (double x[], double y[], int n); int matinv( double x[], int n, int m, double space[]); int eigen(int job, double A[], int n, double rr[], double ri[], double vr[], double vi[], double w[]); #endif /* _EIGEN_H_ */ Seq-Gen-1.3.4/source/evolve.c000077500000000000000000000336241315746145500157500ustar00rootroot00000000000000/* Sequence Generator - seq-gen, version 1.3.4 Copyright (c)1996-2017, Andrew Rambaut & Nick Grassly Institute of Evolutionary Biology, University of Edinburgh All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any feedback is very welcome. http://tree.bio.ed.ac.uk/software/seqgen/ email: a.rambaut@ed.ac.uk */ #include #include #include #include #include "global.h" #include "tree.h" #include "evolve.h" #include "model.h" #include "gamma.h" #include "twister.h" int numTaxa, numSites, maxPartitions, numPartitions, fileFormat; double gammaShape, proportionInvariable; int numCats, rateHetero, invariableSites; double catRate[MAX_RATE_CATS]; double freqRate[MAX_RATE_CATS]; static double *matrix[MAX_RATE_CATS]; static double *vector; static double *gammaRates; static short *categories; static short *invariable; static double *siteRates; /* prototypes */ char SetState(double *P); short IsInvariable(); void SetSequence(char *seq, char *source, int inFromSite, int inNumSites); void SetNucSequence(char *seq, char *source, int inFromSite, int inNumSites); void SetAASequence(char *seq, char *source, int inFromSite, int inNumSites); void RandomSequence(char *seq, int inNumSites); void MutateSequence(char *seq, int inFromSite, int inNumSites, double len); void SetSiteRates(int inFromSite, int inNumSites, double inScale); void EvolveNode(TNode *anc, TNode *des, int inFromSite, int inNumSites, double scale); void WriteFastaFormat(FILE *fv, TTree **treeSet, int *partitionLengths); void WritePhylipFormat(FILE *fv, TTree **treeSet, int *partitionLengths); void WriteNexusFormat(FILE *fv, int treeNo, int datasetNo, TTree **treeSet, int *partitionLengths); void WriteAncestralSequencesNode(FILE *fv, TTree *tree, int *nodeNo, TNode *des); /* functions */ void CreateRates() { int i; if (rateHetero==GammaRates) gammaRates=CAllocMem(numSites*sizeof(double), "gammaRates", "CreateRates", 0); else if (rateHetero==DiscreteGammaRates) categories=CAllocMem(numSites*sizeof(short), "categories", "CreateRates", 0); if (invariableSites) invariable=CAllocMem(numSites*sizeof(short), "invariable", "CreateRates", 0); siteRates=CAllocMem(numSites*sizeof(double), "siteRates", "CreateRates", 0); for (i = 0; i < MAX_RATE_CATS; i++) { matrix[i] = CAllocMem(numStates*numStates*sizeof(double), "matrix", "CreateRates", 0); } vector = CAllocMem(numStates*sizeof(double), "vector", "CreateRates", 0); } void CreateSequences(TTree *tree, int inNumSites) { TNode *P; P=tree->nodeList; while (P!=NULL) { if (P->sequence != NULL) free(P->sequence); P->sequence=CAllocMem(inNumSites+1, "sequences", "CreateSequences", 0); P=P->next; } } void SetCategories() { int i; double sumRates; if (rateHetero==CodonRates) { sumRates=catRate[0]+catRate[1]+catRate[2]; if (sumRates!=3.0) { catRate[0]*=3.0/sumRates; catRate[1]*=3.0/sumRates; catRate[2]*=3.0/sumRates; } } else if (rateHetero==GammaRates) { for (i=0; i(*P) && jlength0*scale; memcpy(des->sequence, anc->sequence, inNumSites); MutateSequence(des->sequence, inFromSite, inNumSites, len); if (des->tipNo==-1) { EvolveNode(des, des->branch1, inFromSite, inNumSites, scale); EvolveNode(des, des->branch2, inFromSite, inNumSites, scale); } } void EvolveSequences(TTree *tree, int inFromSite, int inNumSites, double scale, char *ancestor) { if (ancestor==NULL) RandomSequence(tree->root->sequence, inNumSites); else SetSequence(tree->root->sequence, ancestor, inFromSite, inNumSites); /* Rescale the branch lengths to accommodate the invariable sites */ if (invariableSites) scale /= (1.0 - proportionInvariable); SetSiteRates(inFromSite, inNumSites, scale); EvolveNode(tree->root, tree->root->branch1, inFromSite, inNumSites, scale); EvolveNode(tree->root, tree->root->branch2, inFromSite, inNumSites, scale); if (!tree->rooted) EvolveNode(tree->root, tree->root->branch0, inFromSite, inNumSites, scale); } void WriteSequences(FILE *fv, int treeNo, int datasetNo, TTree **treeSet, int *partitionLengths) { switch (fileFormat) { case PHYLIPFormat: case RelaxedFormat: WritePhylipFormat(fv, treeSet, partitionLengths); break; case NEXUSFormat: WriteNexusFormat(fv, treeNo, datasetNo, treeSet, partitionLengths); break; case FASTAFormat: WriteFastaFormat(fv, treeSet, partitionLengths); break; } } void WritePhylipFormat(FILE *fv, TTree **treeSet, int *partitionLengths) { int i, j, k; char *P; fprintf(fv, " %d %d\n", numTaxa, numSites); for (i=0; inames[i]); else { j=0; P=treeSet[0]->names[i]; while (j<10 && *P) { fputc(*P, fv); j++; P++; } while (j<10) { fputc(' ', fv); j++; } } for (k = 0; k < numPartitions; k++) { P=treeSet[k]->tips[i]->sequence; for (j=0; j%s\n", treeSet[0]->names[i]); for (j = 0; j < numPartitions; j++) { char *P; P=treeSet[j]->tips[i]->sequence; for (k=0; k 0 && datasetNo > 0) fprintf(fv, "Begin DATA;\t[Tree %d, Dataset %d]\n", treeNo, datasetNo); else if (treeNo > 0) fprintf(fv, "Begin DATA;\t[Tree %d]\n", treeNo); else if (datasetNo > 0) fprintf(fv, "Begin DATA;\t[Dataset %d]\n", datasetNo); else fprintf(fv, "Begin DATA;\n"); fprintf(fv, "\tDimensions NTAX=%d NCHAR=%d;\n", numTaxa, numSites); if (isNucModel) { fprintf(fv, "\tFormat MISSING=? GAP=- DATATYPE=DNA;\n"); } else { fprintf(fv, "\tFormat MISSING=? GAP=- DATATYPE=PROTEIN;\n"); } fprintf(fv, "\tMatrix\n"); maxLen = strlen(treeSet[0]->names[0]); for (i=1; inames[i]); if (len > maxLen) maxLen = len; } for (i=0; inames[i]); len = maxLen - strlen(treeSet[0]->names[i]); for (j = 0; j < len; j++) { fputc(' ', fv); } for (k = 0; k < numPartitions; k++) { P=treeSet[k]->tips[i]->sequence; for (j=0; jrooted) n = (2 * numTaxa) - 3; else n = (2 * numTaxa) - 2; fprintf(fv, " %d %d\n", n, numSites); n = numTaxa + 1; fprintf(fv, "%d\t", n); P=tree->root->sequence; for (j=0; jrooted) WriteAncestralSequencesNode(fv, tree, &n, tree->root->branch0); WriteAncestralSequencesNode(fv, tree, &n, tree->root->branch1); WriteAncestralSequencesNode(fv, tree, &n, tree->root->branch2); } void WriteAncestralSequencesNode(FILE *fv, TTree *tree, int *nodeNo, TNode *des) { int j; char *P = des->sequence; if (des->tipNo==-1) { (*nodeNo)++; fprintf(fv, "%d\t", *nodeNo); for (j=0; jbranch1); WriteAncestralSequencesNode(fv, tree, nodeNo, des->branch2); } else { fprintf(fv, "%s\t", tree->names[des->tipNo]); for (j=0; j #include "twister.h" #include "gamma.h" double LnGamma (double alpha); double IncompleteGamma (double x, double alpha, double ln_gamma_alpha); double PointNormal (double prob); double PointChi2 (double prob, double v); double rndgamma1 (double s); double rndgamma2 (double s); double rndgamma (double s) { double r=0.0; if (s <= 0.0) return 0; else if (s < 1.0) r = rndgamma1 (s); else if (s > 1.0) r = rndgamma2 (s); else r =- log(rndu()); return (r); } double rndgamma1 (double s) { double r, x=0.0, small=1e-37, w; static double a, p, uf, ss=10.0, d; if (s!=ss) { a = 1.0-s; p = a/(a+s*exp(-a)); uf = p*pow(small/a,s); d = a*log(a); ss = s; } for (;;) { r = rndu(); if (r > p) x = a-log((1.0-r)/(1.0-p)), w=a*log(x)-d; else if (r>uf) x = a*pow(r/p,1/s), w=x; else return (0.0); r = rndu(); if (1.0-r <= w && r > 0.0) if (r*(w+1.0) >= 1.0 || -log(r) <= w) continue; break; } return (x); } double rndgamma2 (double s) { double r ,d, f, g, x; static double b, h, ss=0; if (s!=ss) { b = s-1.0; h = sqrt(3.0*s-0.75); ss = s; } for (;;) { r = rndu(); g = r-r*r; f = (r-0.5)*h/sqrt(g); x = b+f; if (x <= 0.0) continue; r = rndu(); d = 64*r*r*g*g*g; if (d*x < x-2.0*f*f || log(d) < 2*(b*log(x/b)-f)) break; } return (x); } double LnGamma (double alpha) { /* returns ln(gamma(alpha)) for alpha>0, accurate to 10 decimal places. Stirling's formula is used for the central polynomial part of the procedure. Pike MC & Hill ID (1966) Algorithm 291: Logarithm of the gamma function. Communications of the Association for Computing Machinery, 9:684 */ double x=alpha, f=0, z; if (x<7) { f=1; z=x-1; while (++z<7) f*=z; x=z; f=-log(f); } z = 1/(x*x); return f + (x-0.5)*log(x) - x + .918938533204673 + (((-.000595238095238*z+.000793650793651)*z-.002777777777778)*z +.083333333333333)/x; } double IncompleteGamma (double x, double alpha, double ln_gamma_alpha) { /* returns the incomplete gamma ratio I(x,alpha) where x is the upper limit of the integration and alpha is the shape parameter. returns (-1) if in error ln_gamma_alpha = ln(Gamma(alpha)), is almost redundant. (1) series expansion if (alpha>x || x<=1) (2) continued fraction otherwise RATNEST FORTRAN by Bhattacharjee GP (1970) The incomplete gamma integral. Applied Statistics, 19: 285-287 (AS32) */ int i; double p=alpha, g=ln_gamma_alpha; double accurate=1e-8, overflow=1e30; double factor, gin=0, rn=0, a=0,b=0,an=0,dif=0, term=0, pn[6]; if (x==0) return (0); if (x<0 || p<=0) return (-1); factor=exp(p*log(x)-x-g); if (x>1 && x>=p) goto l30; /* (1) series expansion */ gin=1; term=1; rn=p; l20: rn++; term*=x/rn; gin+=term; if (term > accurate) goto l20; gin*=factor/p; goto l50; l30: /* (2) continued fraction */ a=1-p; b=a+x+1; term=0; pn[0]=1; pn[1]=x; pn[2]=x+1; pn[3]=x*b; gin=pn[2]/pn[3]; l32: a++; b+=2; term++; an=a*term; for (i=0; i<2; i++) pn[i+4]=b*pn[i+2]-an*pn[i]; if (pn[5] == 0) goto l35; rn=pn[4]/pn[5]; dif=fabs(gin-rn); if (dif>accurate) goto l34; if (dif<=accurate*rn) goto l42; l34: gin=rn; l35: for (i=0; i<4; i++) pn[i]=pn[i+2]; if (fabs(pn[4]) < overflow) goto l32; for (i=0; i<4; i++) pn[i]/=overflow; goto l32; l42: gin=1-factor*gin; l50: return (gin); } /* functions concerning the CDF and percentage points of the gamma and Chi2 distribution */ double PointNormal (double prob) { /* returns z so that Prob{x.999998 || v<=0) return (-1); g = LnGamma (v/2); xx=v/2; c=xx-1; if (v >= -1.24*log(p)) goto l1; ch=pow((p*xx*exp(g+xx*aa)), 1/xx); if (ch-e<0) return (ch); goto l4; l1: if (v>.32) goto l3; ch=0.4; a=log(1-p); l2: q=ch; p1=1+ch*(4.67+ch); p2=ch*(6.73+ch*(6.66+ch)); t=-0.5+(4.67+2*ch)/p1 - (6.73+ch*(13.32+3*ch))/p2; ch-=(1-exp(a+g+.5*ch+c*aa)*p2/p1)/t; if (fabs(q/ch-1)-.01 <= 0) goto l4; else goto l2; l3: x=PointNormal (p); p1=0.222222/v; ch=v*pow((x*sqrt(p1)+1-p1), 3.0); if (ch>2.2*v+6) ch=-2*(log(1-p)-c*log(.5*ch)+g); l4: q=ch; p1=.5*ch; if ((t=IncompleteGamma (p1, xx, g))<0) { return (-1); } p2=p-t; t=p2*exp(xx*aa+g+p1-c*log(ch)); b=t/ch; a=0.5*t-b*c; s1=(210+a*(140+a*(105+a*(84+a*(70+60*a))))) / 420; s2=(420+a*(735+a*(966+a*(1141+1278*a))))/2520; s3=(210+a*(462+a*(707+932*a)))/2520; s4=(252+a*(672+1182*a)+c*(294+a*(889+1740*a)))/5040; s5=(84+264*a+c*(175+606*a))/2520; s6=(120+c*(346+127*c))/5040; ch+=t*(1+0.5*t*s1-b*c*(s1-b*(s2-b*(s3-b*(s4-b*(s5-b*s6)))))); if (fabs(q/ch-1) > e) goto l4; return (ch); } #define PointGamma(prob,alpha,beta) PointChi2(prob,2.0*(alpha))/(2.0*(beta)) int DiscreteGamma (double freqK[], double rK[], double alfa, double beta, int K, int median) { /* discretization of gamma distribution with equal proportions in each category */ int i; double gap05=1.0/(2.0*K), t, factor=alfa/beta*K, lnga1; if (median) { for (i=0; i #include #include #include #include "global.h" int verbose=0, verboseMemory=0, quiet=0, userSeed; long totalMem=0; unsigned long randomSeed; /* functions */ /*************************************/ void *AllocMem(long n, char *name, char *func, int showInfo) { void *P; if ( (P=malloc(n))==NULL ) { fprintf(stderr, "Out of memory allocating '%s': %s()\n", name, func); exit(0); } totalMem+=n; if (showInfo && verboseMemory) fprintf(stderr, "%s in %s() - %ld bytes\n", name, func, n); return P; } /*************************************/ void *CAllocMem(long n, char *name, char *func, int showInfo) { void *P; if ( (P=calloc(n, 1))==NULL ) { fprintf(stderr, "Out of memory allocating '%s': %s()\n", name, func); exit(0); } totalMem+=n; if (showInfo && verboseMemory) fprintf(stderr, "%s in %s() - %ld bytes\n", name, func, n); return P; } /*************************************/ int GetDoubleParams(int argc, char **argv, int *argn, char *pos, int numParams, double *params) { int i; char *st, buf[256]; i=0; strcpy(buf, pos); st=strtok(buf, "\t,/"); do { if (st==NULL) { if ((*argn)+1 #include #include #include #include "evolve.h" #include "model.h" #include "nucmodels.h" #include "aamodels.h" char *modelNames[numModels]={ "HKY", "F84", "GTR", "JTT", "WAG", "PAM", "BLOSUM", "MTREV", "CPREV", "MTART", "LG", "GENERAL" }; char *modelTitles[numModels]={ "HKY: Hasegawa, Kishino & Yano (1985)", "F84: Felsenstein (1984)", "GTR: General time reversible (nucleotides)", "JTT: Jones, Taylor & Thornton (1992) CABIOS 8:275-282\n DCMut version Kosiol & Goldman (2004) ", "WAG: Whelan & Goldman (2001) Mol Biol Evol 18:691699", "PAM: Dayhoff, Schwartz & Orcutt (1978)\n DCMut version Kosiol & Goldman (2004) ", "BLOSUM62: Henikoff & Henikoff (1992) PNAS USA 89:10915-10919", "MTREV24: Adachi & Hasegawa (1996) J Mol Evol 42:459-468", "CPREV45: Adachi et al. (2000) J Mol Evol 50:348-358", "MTART: Abascal et al. (2007) Mol Biol Evol 24:1-5", "LG: Le & Gascuel (2008) Mol Biol Evol 25:1307-1320", "GENERAL: General time reversible (amino acids)" }; int model, numStates, isNucModel, userFreqs, equalFreqs; char *stateCharacters; double *freq, *addFreq; void SetupFrequencies(); void SetupMatrices(); void InitialiseEigen(); void elmhes(double** a, int* ordr, int n); void mcdiv(double ar, double ai, double br, double bi); void hqr2(int n, int low, int hgh, double** h, double** zz, double* wr, double* wi); void eltran(double** a, double** zz, int* ordr, int n); void luinverse(double** inmat, double** imtrx, int size); /*************************************/ void SetModel(int theModel) { int i; model=theModel; if (isNucModel) { numStates = NUM_NUC; SetNucModel(theModel); freq = nucFreq; addFreq = nucAddFreq; stateCharacters = nucleotides; } else { numStates = NUM_AA; SetAAModel(theModel - numNucModels); freq = aaFreq; addFreq = aaAddFreq; stateCharacters = aminoAcids; } addFreq[0]=freq[0]; for (i = 1; i < numStates; i++) { addFreq[i] = addFreq[i-1] + freq[i]; } } void SetMatrix(double *matrix, double len) { if (isNucModel) { SetNucMatrix(matrix, len); } else { SetAAMatrix(matrix, len); } } void SetVector(double *vector, short state, double len) { if (isNucModel) { SetNucVector(vector, state, len); } else { SetAAVector(vector, state, len); } } Seq-Gen-1.3.4/source/model.h000077500000000000000000000045161315746145500155530ustar00rootroot00000000000000/* Header file for model.c */ /* Sequence Generator - seq-gen, version 1.3.4 Copyright (c)1996-2017, Andrew Rambaut & Nick Grassly Institute of Evolutionary Biology, University of Edinburgh All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any feedback is very welcome. http://tree.bio.ed.ac.uk/software/seqgen/ email: a.rambaut@ed.ac.uk */ #ifndef _MODEL_H_ #define _MODEL_H_ extern char *stateCharacters; enum { NONE=-1, F84, HKY, GTR, JTT, WAG, PAM, BLOSUM, MTREV, CPREV, MTART, LG, GENERAL, numModels }; extern char *modelNames[numModels]; extern char *modelTitles[numModels]; extern int model, isNucModel, numStates, userFreqs, equalFreqs; extern double *freq, *addFreq; void SetModel(int theModel); void SetMatrix(double *matrix, double len); void SetVector(double *vector, short state, double len); #endif /* _MODEL_H_ */ Seq-Gen-1.3.4/source/nucmodels.c000077500000000000000000000257011315746145500164360ustar00rootroot00000000000000/* Sequence Generator - seq-gen, version 1.3.4 Copyright (c)1996-2017, Andrew Rambaut & Nick Grassly Institute of Evolutionary Biology, University of Edinburgh All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any feedback is very welcome. http://tree.bio.ed.ac.uk/software/seqgen/ email: a.rambaut@ed.ac.uk */ #include #include #include #include #include "eigen.h" #include "nucmodels.h" char *nucleotides="ACGT"; int nucModel = NUC_NONE; int equalTstv; double nucFreq[NUM_NUC] = { 0.25, 0.25, 0.25, 0.25 }; double nucAddFreq[NUM_NUC]; double nucMatrix[MAX_RATE_CATS][SQNUM_NUC]; double nucVector[NUM_NUC]; double freqR, freqY, freqAG, freqCT; double freqA, freqC, freqG, freqT; double tstv, kappa; static double beta, beta_A_R, beta_A_Y; static double tab1A, tab2A, tab3A; static double tab1C, tab2C, tab3C; static double tab1G, tab2G, tab3G; static double tab1T, tab2T, tab3T; static double mu, mu_kappa_1; double nucRelativeRates[NUM_NUC_REL_RATES] = { 1.0, 1.0, 1.0, 1.0, 1.0, 1.0}; static double Qij[SQNUM_NUC], Cijk[CUNUM_NUC], Root[NUM_NUC]; void SetHKYMatrix(double *matrix, double len); void SetHKYVector(double *vector, short state, double len); void SetF84Matrix(double *matrix, double len); void SetF84Vector(double *vector, short base, double len); void SetGTRMatrix(double *matrix, double len); void SetGTRVector(double *vector, short state, double len); void CommonMatrix(double aa, double bbR, double bbY, double *matrix); void CommonVector(double aa, double bbR, double bbY, double *vector, short state); void CumulativeRows(double *matrix); void SetupGTR(); void CheckNucFrequencies(); /*************************************/ void SetNucModel(int theNucModel) { double xi, xv, F84temp1, F84temp2; double freqAG, freqCT, freqA2, freqC2, freqG2, freqT2; nucModel = theNucModel; freqA=nucFreq[A]; freqC=nucFreq[C]; freqG=nucFreq[G]; freqT=nucFreq[T]; if (nucModel==NUC_F84 || nucModel==NUC_HKY) { freqR=freqA+freqG; freqY=freqC+freqT; freqAG=freqA*freqG; freqCT=freqC*freqT; tab1A=freqA*((1/freqR)-1); tab2A=(freqR-freqA)/freqR; tab3A=freqA/freqR; tab1C=freqC*((1/freqY)-1); tab2C=(freqY-freqC)/freqY; tab3C=freqC/freqY; tab1G=freqG*((1/freqR)-1); tab2G=(freqR-freqG)/freqR; tab3G=freqG/freqR; tab1T=freqT*((1/freqY)-1); tab2T=(freqY-freqT)/freqY; tab3T=freqT/freqY; } switch (nucModel) { case NUC_HKY: if (!equalTstv) { kappa=(tstv*freqR*freqY)/(freqAG+freqCT); } else { kappa = 1.0; tstv=(kappa*(freqA*freqG + freqC*freqT))/(freqR*freqY); } beta=-1.0/(2*(freqR*freqY + kappa*(freqAG+freqCT))); beta_A_R=beta*(1.0+freqR*(kappa-1)); beta_A_Y=beta*(1.0+freqY*(kappa-1)); break; case NUC_F84: freqA2=freqA*freqA; freqC2=freqC*freqC; freqG2=freqG*freqG; freqT2=freqT*freqT; F84temp1=freqA2+freqC2+freqG2+freqT2; F84temp2=((freqA2/freqR)+(freqC2/freqY)+(freqG2/freqR)+(freqT2/freqY)); if (!equalTstv) { xi=freqR*freqY*(freqR*freqY*tstv-freqAG-freqCT); xv=(freqCT*freqR)+(freqAG*freqY); kappa=xi/xv; } else { kappa = 0.0; tstv=(freqY*(freqAG*freqR+freqCT*freqR))/(freqR*freqR*freqY*freqY); } mu=-1.0/((1-F84temp1)+(kappa*(1-F84temp2))); mu_kappa_1=mu*(kappa+1); break; case NUC_GTR: SetupGTR(); break; } } void SetNucMatrix(double *matrix, double len) { switch (nucModel) { case NUC_HKY: SetHKYMatrix(matrix, len); break; case NUC_F84: SetF84Matrix(matrix, len); break; case NUC_GTR: SetGTRMatrix(matrix, len); break; } } void SetNucVector(double *vector, short state, double len) { switch (nucModel) { case NUC_HKY: SetHKYVector(vector, state, len); break; case NUC_F84: SetF84Vector(vector, state, len); break; case NUC_GTR: SetGTRVector(vector, state, len); break; } } void SetHKYMatrix(double *matrix, double len) { double aa, bbR, bbY; aa=exp(beta*len); bbR=exp(beta_A_R*len); bbY=exp(beta_A_Y*len); CommonMatrix(aa, bbR, bbY, matrix); } void SetHKYVector(double *vector, short state, double len) { double aa, bbR, bbY; aa=exp(beta*len); bbR=exp(beta_A_R*len); bbY=exp(beta_A_Y*len); CommonVector(aa, bbR, bbY, vector, state); } void SetF84Matrix(double *matrix, double len) { double aa, bbR, bbY; aa=exp(mu*len); bbR=bbY=exp(mu_kappa_1*len); CommonMatrix(aa, bbR, bbY, matrix); } void SetF84Vector(double *vector, short state, double len) { double aa, bbR, bbY; aa=exp(mu*len); bbR=bbY=exp(mu_kappa_1*len); CommonVector(aa, bbR, bbY, vector, state); } void SetGTRMatrix(double *matrix, double len) { int i,j,k; double expt[4]; double *P; /* P(t)ij = SUM Cijk * exp{Root*t} */ P=matrix; if (len<1e-6) { for (i=0; i<4; i++) { for (j=0; j<4; j++) { if (i==j) *P=1.0; else *P=0.0; P++; } } return; } for (k=1; k<4; k++) { expt[k]=exp(len*Root[k]); } for (i=0; i<4; i++) { for (j=0; j<4; j++) { (*P)=Cijk[i*4*4+j*4+0]; for (k=1; k<4; k++) { (*P)+=Cijk[i*4*4+j*4+k]*expt[k]; } P++; } } CumulativeRows(matrix); } void SetGTRVector(double *vector, short state, double len) { int i,j,k; double expt[4]; double *P; P=vector; if (len<1e-6) { for (i=0; i<4; i++) { if (i==state) *P=1.0; else *P=0.0; P++; } return; } for (k=1; k<4; k++) { expt[k]=exp(len*Root[k]); } for (j=0; j<4; j++) { (*P)=Cijk[state*4*4+j*4+0]; for (k=1; k<4; k++) { (*P)+=Cijk[state*4*4+j*4+k]*expt[k]; } P++; } vector[1]+=vector[0]; vector[2]+=vector[1]; vector[3]+=vector[2]; } #define PIJ_SAME_A freqA+(tab1A*aa)+(tab2A*bbR) #define PIJ_TS_A freqA+(tab1A*aa)-(tab3A*bbR) #define PIJ_TV_A freqA*(1-aa) #define PIJ_SAME_C freqC+(tab1C*aa)+(tab2C*bbY) #define PIJ_TS_C freqC+(tab1C*aa)-(tab3C*bbY) #define PIJ_TV_C freqC*(1-aa) #define PIJ_SAME_G freqG+(tab1G*aa)+(tab2G*bbR) #define PIJ_TS_G freqG+(tab1G*aa)-(tab3G*bbR) #define PIJ_TV_G freqG*(1-aa) #define PIJ_SAME_T freqT+(tab1T*aa)+(tab2T*bbY) #define PIJ_TS_T freqT+(tab1T*aa)-(tab3T*bbY) #define PIJ_TV_T freqT*(1-aa) void CommonMatrix(double aa, double bbR, double bbY, double *matrix) { matrix[0]=PIJ_SAME_A; matrix[1]=PIJ_TV_C; matrix[2]=PIJ_TS_G; matrix[3]=PIJ_TV_T; matrix[4]=PIJ_TV_A; matrix[5]=PIJ_SAME_C; matrix[6]=PIJ_TV_G; matrix[7]=PIJ_TS_T; matrix[8]=PIJ_TS_A; matrix[9]=matrix[1]; /* PIJ_TV_C */ matrix[10]=PIJ_SAME_G; matrix[11]=matrix[3]; /* PIJ_TV_T */ matrix[12]=matrix[4]; /* PIJ_TV_A */ matrix[13]=PIJ_TS_C; matrix[14]=matrix[6]; /* PIJ_TV_G */ matrix[15]=PIJ_SAME_T; CumulativeRows(matrix); } void CumulativeRows(double *matrix) { /* the rows are cumulative to help with picking one using a random number */ matrix[1]+=matrix[0]; matrix[2]+=matrix[1]; matrix[3]+=matrix[2]; /* This should always be 1.0... */ matrix[5]+=matrix[4]; matrix[6]+=matrix[5]; matrix[7]+=matrix[6]; /* ...but it is easier to spot bugs... */ matrix[9]+=matrix[8]; matrix[10]+=matrix[9]; matrix[11]+=matrix[10]; /* ...though less efficient... */ matrix[13]+=matrix[12]; matrix[14]+=matrix[13]; matrix[15]+=matrix[14]; /* ...but probably not much. */ } void CommonVector(double aa, double bbR, double bbY, double *vector, short state) { switch (state) { case 0: vector[0]=PIJ_SAME_A; vector[1]=PIJ_TV_C+vector[0]; vector[2]=PIJ_TS_G+vector[1]; vector[3]=PIJ_TV_T+vector[2]; break; case 1: vector[0]=PIJ_TV_A; vector[1]=PIJ_SAME_C+vector[0]; vector[2]=PIJ_TV_G+vector[1]; vector[3]=PIJ_TS_T+vector[2]; break; case 2: vector[0]=PIJ_TS_A; vector[1]=PIJ_TV_C+vector[0]; vector[2]=PIJ_SAME_G+vector[1]; vector[3]=PIJ_TV_T+vector[2]; break; case 3: vector[0]=PIJ_TV_A; vector[1]=PIJ_TS_C+vector[0]; vector[2]=PIJ_TV_G+vector[1]; vector[3]=PIJ_SAME_T+vector[2]; break; } } void SetupGTR() { int i,j,k; double mr; double sum; double U[SQNUM_NUC], V[SQNUM_NUC], T1[SQNUM_NUC], T2[SQNUM_NUC]; CheckNucFrequencies(); k=0; for (i=0; i maxfreq) { maxfreq = nucFreq[i]; maxi = i; } sum += nucFreq[i]; } diff = 1.0 - sum; nucFreq[maxi] += diff; for (i = 0; i < NUM_NUC - 1; i++) { for (j = i+1; j < NUM_NUC; j++) { if (nucFreq[i] == nucFreq[j]) { nucFreq[i] += MINFDIFF; nucFreq[j] -= MINFDIFF; } } } } Seq-Gen-1.3.4/source/nucmodels.h000077500000000000000000000050271315746145500164420ustar00rootroot00000000000000/* Header file for nucmodels.c */ /* Sequence Generator - seq-gen, version 1.3.4 Copyright (c)1996-2017, Andrew Rambaut & Nick Grassly Institute of Evolutionary Biology, University of Edinburgh All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any feedback is very welcome. http://tree.bio.ed.ac.uk/software/seqgen/ email: a.rambaut@ed.ac.uk */ #ifndef _NUC_MODELS_H_ #define _NUC_MODELS_H_ #include "evolve.h" #define NUM_NUC 4 #define SQNUM_NUC 16 #define CUNUM_NUC 256 #define NUM_NUC_REL_RATES 6 enum { NUC_NONE = -1, NUC_HKY, NUC_F84, NUC_GTR, numNucModels }; extern char *nucleotides; enum { A, C, G, T }; extern int equalTstv; extern double nucFreq[NUM_NUC]; extern double nucAddFreq[NUM_NUC]; extern double nucMatrix[MAX_RATE_CATS][SQNUM_NUC]; extern double nucVector[NUM_NUC]; extern double nucRelativeRates[NUM_NUC_REL_RATES]; extern double tstv, kappa; void SetNucModel(int theModel); void SetNucMatrix(double *matrix, double len); void SetNucVector(double *vector, short state, double len); #endif /* _NUC_MODELS_H_ */ Seq-Gen-1.3.4/source/progress.c000077500000000000000000000051631315746145500163110ustar00rootroot00000000000000/* Sequence Generator - seq-gen, version 1.3.4 Copyright (c)1996-2017, Andrew Rambaut & Nick Grassly Institute of Evolutionary Biology, University of Edinburgh All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any feedback is very welcome. http://tree.bio.ed.ac.uk/software/seqgen/ email: a.rambaut@ed.ac.uk */ #include #include #include "progress.h" static int barLength, dotGap, dots, bar; /*********************/ void InitProgressBar(int total) { barLength=total; dotGap=1; if (barLength>MAX_BAR_LENGTH) { dotGap=barLength/MAX_BAR_LENGTH; if (barLength%dotGap) barLength=(barLength/dotGap)+1; else barLength/=dotGap; } } /*********************/ void DrawProgressBar() { int i; if (barLength<2) return; fprintf(stderr, "0%%|"); for (i=0; i<(barLength); i++) fputc('_', stderr); fprintf(stderr, "|100%%\n ["); fflush(stderr); dots=0; bar=0; } /*********************/ void ProgressBar() { if (barLength<2) return; if (bar%dotGap==0) { fputc('.', stderr); fflush(stderr); dots++; if (dots==barLength) { fputc(']', stderr); fputc('\n', stderr); } fflush(stderr); } bar++; } Seq-Gen-1.3.4/source/progress.h000077500000000000000000000040401315746145500163070ustar00rootroot00000000000000/* Header file for progress.c */ /* Sequence Generator - seq-gen, version 1.3.4 Copyright (c)1996-2017, Andrew Rambaut & Nick Grassly Institute of Evolutionary Biology, University of Edinburgh All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any feedback is very welcome. http://tree.bio.ed.ac.uk/software/seqgen/ email: a.rambaut@ed.ac.uk */ #ifndef _PROGRESS_H_ #define _PROGRESS_H_ #define MAX_BAR_LENGTH 20 /* prototypes */ void InitProgressBar(int total); void DrawProgressBar(); void ProgressBar(); #endif /* _PROGRESS_H_ */ Seq-Gen-1.3.4/source/seq-gen.c000077500000000000000000000643511315746145500160100ustar00rootroot00000000000000/* Sequence Generator - seq-gen, version 1.3.4 Copyright (c)1996-2017, Andrew Rambaut & Nick Grassly Institute of Evolutionary Biology, University of Edinburgh All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any feedback is very welcome. http://tree.bio.ed.ac.uk/software/seqgen/ email: a.rambaut@ed.ac.uk */ #include #include #include #include #include #include #include "global.h" #include "treefile.h" #include "evolve.h" #include "model.h" #include "nucmodels.h" #include "aamodels.h" #include "progress.h" #include "twister.h" #define PROGRAM_NAME "seq-gen" #define VERSION_NUMBER "Version 1.3.4" int treeFile, textFile, numDatasets, numTrees; int scaleTrees, scaleBranches, ancestorSeq, writeAncestors, writeRates; int *partitionLengths; double *partitionRates; double treeScale, branchScale; char treeFileName[256]; char textFileName[256]; int hasAlignment, numSequences, numAlignmentSites; char **names; char **sequences; FILE *tree_fv; /* prototypes */ static void PrintTitle(); static void PrintUsage(); static void PrintVerbose(FILE *fv); static void ReadParams(); static void ReadFileParams(); static void AllocateMemory(); static void ReadFile(); static int OpenTreeFile(); /* functions */ static void PrintTitle() { fprintf(stderr, "Sequence Generator - %s\n", PROGRAM_NAME); fprintf(stderr, "%s\n", VERSION_NUMBER); fprintf(stderr, "(c) Copyright, 1996-2017 Andrew Rambaut and Nick Grassly\n"); fprintf(stderr, "Institute of Evolutionary Biology, University of Edinburgh\n\n"); fprintf(stderr, "Originally developed at:\n"); fprintf(stderr, "Department of Zoology, University of Oxford\n\n"); } static void PrintUsage() { fprintf(stderr, "Usage: seq-gen [-m MODEL] [-l #] [-n #] [-p #] [-s # | -d #] [-k #]\n"); fprintf(stderr, " [-c #1 #2 #3 | -a # [-g #]] [-i #] [-f e | #] [-t # | -r #]\n"); fprintf(stderr, " [-z #] [-o[p][r][n]] [-w[a][r]] [-x NAME] [-q] [-h] [treefile]\n"); fprintf(stderr, " -l: # = sequence length [default = 1000].\n"); fprintf(stderr, " -n: # = simulated datasets per tree [default = 1].\n"); fprintf(stderr, " -p: # = number of partitions (and trees) per sequence [default = 1].\n"); fprintf(stderr, " -s: # = branch length scaling factor [default = 1.0].\n"); fprintf(stderr, " -d: # = total tree scale [default = use branch lengths].\n"); fprintf(stderr, " -k: # = use sequence k as ancestral (needs alignment) [default = random].\n"); fprintf(stderr, "\n Substitution model options:\n"); fprintf(stderr, " -m: MODEL = HKY, F84, GTR, JTT, WAG, PAM, BLOSUM, MTREV, CPREV45, MTART, LG, GENERAL\n"); fprintf(stderr, " HKY, F84 & GTR are for nucleotides the rest are for amino acids\n"); fprintf(stderr, " -a: # = shape (alpha) for gamma rate heterogeneity [default = none].\n"); fprintf(stderr, " -g: # = number of gamma rate categories [default = continuous].\n"); fprintf(stderr, " -i: # = proportion of invariable sites [default = 0.0].\n"); fprintf(stderr, "\n Nucleotide model specific options:\n"); fprintf(stderr, " -c: #1 #2 #3 = rates for codon position heterogeneity [default = none].\n"); fprintf(stderr, " -t: # = transition-transversion ratio [default = equal rate].\n"); fprintf(stderr, " -r: #1 #2 #3 #4 #5 #6= general rate matrix [default = all 1.0].\n"); fprintf(stderr, " -f: #A #C #G #T = nucleotide frequencies [default = all equal].\n"); fprintf(stderr, "\n Amino Acid model specific options:\n"); fprintf(stderr, " specify using the order ARNDCQEGHILKMFPSTWYV\n"); fprintf(stderr, " -r: #1 .. #190 = general rate matrix [default = all 1.0].\n"); fprintf(stderr, " -f: #1 .. #20 = amino acid frequencies e=equal [default = matrix freqs].\n"); fprintf(stderr, "\n Miscellaneous options:\n"); fprintf(stderr, " -z: # = seed for random number generator [default = system generated].\n"); fprintf(stderr, " -o: Output file format [default = PHYLIP]\n"); fprintf(stderr, " p PHYLIP format\n"); fprintf(stderr, " r relaxed PHYLIP format\n"); fprintf(stderr, " n NEXUS format\n"); fprintf(stderr, " f FASTA format\n"); fprintf(stderr, " -w: Write additional information [default = none]\n"); fprintf(stderr, " a Write ancestral sequences for each node\n"); fprintf(stderr, " r Write rate for each site\n"); fprintf(stderr, " -x: NAME = a text file to insert after every dataset [default = none].\n"); fprintf(stderr, " -h: Give this help message\n"); fprintf(stderr, " -q: Quiet\n"); fprintf(stderr, " treefile: name of tree file [default = trees on stdin]\n\n"); } void ReadParams(int argc, char **argv) { int i, j, k; char ch, *P, st[4]; int modelTwoArgs = 0; model=NONE; scaleTrees=0; treeScale=0.0; scaleBranches=0; branchScale=0.0; maxPartitions=1; numPartitions=1; userSeed = 0; numCats=1; rateHetero=NoRates; catRate[0]=1.0; gammaShape=1.0; invariableSites=0; proportionInvariable = 0.0; equalFreqs = 1; equalTstv = 1; tstv=0.50002; for (i = 0; i < NUM_AA_REL_RATES; i++) { aaRelativeRate[i] = 1.0; } for (i = 0; i < NUM_AA; i++) { aaFreq[i] = 1.0; } aaFreqSet = 0; numSites=-1; numDatasets=1; ancestorSeq=0; writeAncestors=0; writeRates=0; verbose=1; fileFormat = PHYLIPFormat; quiet=0; treeFile=0; textFile=0; for (i=1; i k) { modelTwoArgs = 1; } model=-1; for (j=F84; j= 1.0) { fprintf(stderr, "Bad Proportion of Invariable Sites: %s\n\n", argv[i]); exit(1); } invariableSites = 1; break; case 'A': if (rateHetero==CodonRates) { fprintf(stderr, "You can only have codon rates or gamma rates not both\n\n"); exit(1); } if (rateHetero==NoRates) rateHetero=GammaRates; if (GetDoubleParams(argc, argv, &i, P, 1, &gammaShape) || gammaShape<=0.0) { fprintf(stderr, "Bad Gamma Shape: %s\n\n", argv[i]); exit(1); } break; case 'G': if (rateHetero==CodonRates) { fprintf(stderr, "You can only have codon rates or gamma rates not both\n\n"); exit(1); } rateHetero=DiscreteGammaRates; if (GetIntParams(argc, argv, &i, P, 1, &numCats) || numCats<2 || numCats>MAX_RATE_CATS) { fprintf(stderr, "Bad number of Gamma Categories: %s\n\n", argv[i]); exit(1); } break; case 'F': if (isNucModel) { if (toupper(*P)=='E'){ /* do nothing - equal freqs is default for nucleotides */ } else { equalFreqs = 0; if (GetDoubleParams(argc, argv, &i, P, NUM_NUC, nucFreq)) { fprintf(stderr, "Bad Nucleotide Frequencies: %s\n\n", argv[i]); exit(1); } } } else { aaFreqSet = 1; if (toupper(*P)=='E'){ equalFreqs = 1; for(j=0;j 1) { fprintf(fv, " and %d partitions (and trees) per dataset\n", numPartitions); fprintf(fv, " Partition No. Sites Relative Rate\n"); for (i = 0; i < numPartitions; i++) fprintf(fv, " %4d %7d %lf\n", i+1, partitionLengths[i], partitionRates[i]); } fputc('\n', fv); if (hasAlignment) { fprintf(fv, "Alignment read: numSequences = %d, numAlignmentSites = %d\n", numSequences, numAlignmentSites); if (ancestorSeq > 0) { fprintf(fv, "Using sequence %d as the ancestral sequence\n", ancestorSeq); } fputc('\n', fv); } if (scaleTrees) { fprintf(fv, "Branch lengths of trees scaled so that tree is %G from root to tip\n\n", treeScale); } else if (scaleBranches) { fprintf(fv, "Branch lengths of trees multiplied by %G\n\n", branchScale); } else { fprintf(fv, "Branch lengths assumed to be number of substitutions per site\n\n"); } if (rateHetero==CodonRates) { fprintf(fv, "Codon position rate heterogeneity:\n"); fprintf(fv, " rates = 1:%f 2:%f 3:%f\n", catRate[0], catRate[1], catRate[2]); } else if (rateHetero==GammaRates) { fprintf(fv, "Continuous gamma rate heterogeneity:\n"); fprintf(fv, " shape = %f\n", gammaShape); } else if (rateHetero==DiscreteGammaRates) { fprintf(fv, "Discrete gamma rate heterogeneity:\n"); fprintf(fv, " shape = %f, %d categories\n", gammaShape, numCats); } else fprintf(fv, "Rate homogeneity of sites.\n"); if (invariableSites) { fprintf(fv, "Invariable sites model:\n"); fprintf(fv, " proportion invariable = %f\n", proportionInvariable); } fprintf(fv, "Model = %s\n", modelTitles[model]); if (isNucModel) { if (equalTstv) { fprintf(fv, " Rate of transitions and transversions equal:\n"); } if (model==F84) { fprintf(fv, " transition/transversion ratio = %G (K=%G)\n", tstv, kappa); } else if (model==HKY) { fprintf(fv, " transition/transversion ratio = %G (kappa=%G)\n", tstv, kappa); } else if (model==GTR) { fprintf(fv, " rate matrix = gamma1:%7.4f alpha1:%7.4f beta1:%7.4f\n", nucRelativeRates[0], nucRelativeRates[1], nucRelativeRates[2]); fprintf(fv, " beta2:%7.4f alpha2:%7.4f\n", nucRelativeRates[3], nucRelativeRates[4]); fprintf(fv, " gamma2: %7.4f\n", nucRelativeRates[5]); } if (equalFreqs) { fprintf(fv, " with nucleotide frequencies equal.\n"); } else { fprintf(fv, " with nucleotide frequencies specified as:\n"); fprintf(fv, " A=%G C=%G G=%G T=%G\n\n", freq[A], freq[C], freq[G], freq[T]); } } else { if (aaFreqSet) { if (equalFreqs) { fprintf(fv, " with amino acid frequencies equal.\n\n"); } else { fprintf(fv, " with amino acid frequencies specified as:\n"); fprintf(fv, " "); for (i = 0; i < NUM_AA; i++) { fprintf(fv, " %c=%G", aminoAcids[i], freq[i]); } fprintf(fv, "\n\n"); } } } } void ReadFileParams() { char ch, st[256]; hasAlignment=0; ch=fgetc(tree_fv); while (!feof(tree_fv) && isspace(ch)) { ch=fgetc(tree_fv); } ungetc(ch, tree_fv); if (ch!='(' && isdigit(ch)) { fgets(st, 255, tree_fv); if ( sscanf( st, " %d %d", &numSequences, &numAlignmentSites)!=2 ) { fprintf(stderr, "Unable to read parameters from standard input\n"); exit(2); } hasAlignment=1; // fprintf(stderr, "%d sequences, %d sites\n", numSequences, numAlignmentSites); } } void AllocateMemory() { int i; names=(char **)AllocMem(sizeof(char *)*numSequences, "names", "AllocateMemory", 0); sequences=(char **)AllocMem(sizeof(char *)*numSequences, "sequences", "AllocateMemory", 0); for (i=0; i 1) { fprintf(stderr, "Writing ancestral sequences can only be used for a single partition.\n"); exit(4); } if (!userSeed) randomSeed = CreateSeed(); SetSeed(randomSeed); if (!quiet) PrintTitle(); numTrees = OpenTreeFile(); /* if (!treeFile) { */ ReadFileParams(); /*} */ if ((ancestorSeq>0 && !hasAlignment) || ancestorSeq>numSequences) { fprintf(stderr, "Bad ancestral sequence number: %d (%d sequences loaded)\n", ancestorSeq, numSequences); exit(4); } if (textFile) { if ( (text_fv=fopen(textFileName, "rt"))==NULL ) { fprintf(stderr, "Error opening text file for insertion into output: '%s'\n", textFileName); exit(4); } } ancestor=NULL; if (hasAlignment) { AllocateMemory(); ReadFile(); if (numSites<0) numSites=numAlignmentSites; if (ancestorSeq>0) { if (numSites!=numAlignmentSites) { fprintf(stderr, "Ancestral sequence is of a different length to the simulated sequences (%d)\n", numAlignmentSites); exit(4); } ancestor=sequences[ancestorSeq-1]; } } else if (numSites<0) numSites=1000; SetModel(model); numTaxa=-1; scale=1.0; treeSet = (TTree **)malloc(sizeof(TTree **) * maxPartitions); if (treeSet==NULL) { fprintf(stderr, "Out of memory\n"); exit(5); } partitionLengths = (int *)malloc(sizeof(int) * maxPartitions); if (partitionLengths==NULL) { fprintf(stderr, "Out of memory\n"); exit(5); } partitionRates = (double *)malloc(sizeof(double) * maxPartitions); if (partitionRates==NULL) { fprintf(stderr, "Out of memory\n"); exit(5); } for (i = 0; i < maxPartitions; i++) { if ((treeSet[i]=NewTree())==NULL) { fprintf(stderr, "Out of memory\n"); exit(5); } } CreateRates(); treeNo=0; do { partitionLengths[0] = -1; ReadTree(tree_fv, treeSet[0], treeNo+1, 0, NULL, &partitionLengths[0], &partitionRates[0]); if (treeNo==0) { numTaxa=treeSet[0]->numTips; if (!quiet) fprintf(stderr, "Random number generator seed: %ld\n\n", randomSeed); if (fileFormat == NEXUSFormat) { fprintf(stdout, "#NEXUS\n"); fprintf(stdout, "[\nGenerated by %s %s\n\n", PROGRAM_NAME, VERSION_NUMBER); PrintVerbose(stdout); fprintf(stdout, "]\n\n"); } } else if (treeSet[0]->numTips != numTaxa) { fprintf(stderr, "All trees must have the same number of tips.\n"); exit(4); } if (maxPartitions == 1) { if (partitionLengths[0] != -1) { fprintf(stderr, "\nWARNING: The treefile contained partion lengths but only one partition\n"); fprintf(stderr, "was specified.\n"); } partitionLengths[0] = numSites; } sumLength = partitionLengths[0]; i = 1; while (sumLength < numSites && i <= maxPartitions) { if (!IsTreeAvail(tree_fv)) { fprintf(stderr, "\nA set of trees number %d had less partition length (%d) than\n", treeNo + 1, sumLength); fprintf(stderr, "was required to make a sequence of length %d.\n", numSites); exit(4); } ReadTree(tree_fv, treeSet[i], treeNo+1, treeSet[0]->numTips, treeSet[0]->names, &partitionLengths[i], &partitionRates[i]); if (treeSet[i]->numTips != numTaxa) { fprintf(stderr, "All trees must have the same number of tips.\n"); exit(4); } sumLength += partitionLengths[i]; i++; } if (i > maxPartitions) { fprintf(stderr, "\nA set of trees number %d had more partitions (%d) than\n", treeNo + 1, i); fprintf(stderr, "was specified in the user options (%d).\n", maxPartitions); } numPartitions = i; if (sumLength != numSites) { fprintf(stderr, "The sum of the partition lengths in the treefile does not equal\n"); fprintf(stderr, "the specified number of sites.\n"); exit(4); } for (i = 0; i < numPartitions; i++) CreateSequences(treeSet[i], partitionLengths[i]); if (numPartitions > 1) { sum = 0.0; for (i = 0; i < numPartitions; i++) sum += partitionRates[i] * partitionLengths[i]; for (i = 0; i < numPartitions; i++) partitionRates[i] *= numSites / sum; } if (treeNo==0 && verbose && !quiet) { PrintVerbose(stderr); InitProgressBar(numTrees*numDatasets); DrawProgressBar(); } for (i=0; irooted) { fprintf(stderr, "To scale tree length, they must be rooted and ultrametric.\n"); exit(4); } scale *= treeScale/treeSet[j]->totalLength; } else if (scaleBranches) scale *= branchScale; EvolveSequences(treeSet[j], k, partitionLengths[j], scale, ancestor); k += partitionLengths[j]; } if (writeAncestors) WriteAncestralSequences(stdout, treeSet[0]); else WriteSequences(stdout, (numTrees > 1 ? treeNo+1 : -1), (numDatasets > 1 ? i+1 : -1), treeSet, partitionLengths); if (writeRates) { WriteRates(stderr); } if (textFile) { while (!feof(text_fv)) { ch = fgetc(text_fv); if (!feof(text_fv)) fputc(ch, stdout); } fputc('\n', stdout); rewind(text_fv); } if (verbose && !quiet) ProgressBar(); } for (i = 0; i < numPartitions; i++) DisposeTree(treeSet[i]); treeNo++; } while (IsTreeAvail(tree_fv)); /* for (i = 0; i < maxPartitions; i++) FreeTree(treeSet[i]); */ if (treeFile) fclose(tree_fv); if (textFile) fclose(text_fv); totalSecs = (double)(clock() - totalStart) / CLOCKS_PER_SEC; if (!quiet) { fprintf(stderr, "Time taken: %G seconds\n", totalSecs); if (verboseMemory) fprintf(stderr, "Total memory used: %ld\n", totalMem); } return 0; } Seq-Gen-1.3.4/source/tree.h000077500000000000000000000044211315746145500154050ustar00rootroot00000000000000/* Header file to define tree and node structures */ /* Sequence Generator - seq-gen, version 1.3.4 Copyright (c)1996-2017, Andrew Rambaut & Nick Grassly Institute of Evolutionary Biology, University of Edinburgh All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any feedback is very welcome. http://tree.bio.ed.ac.uk/software/seqgen/ email: a.rambaut@ed.ac.uk */ #ifndef _TREE_H_ #define _TREE_H_ #define MAX_NAME_LEN 256 typedef struct TNode TNode; struct TNode { TNode *branch0, *branch1, *branch2, *next; double length0, length1, length2, param; int tipNo; char *sequence; }; typedef struct TTree TTree; struct TTree { int rooted, lengths; TNode *root, *nodeList; int numTips, numNodes; double totalLength; char **names; TNode **tips; int capacity; }; #endif /* _TREE_H_ */ Seq-Gen-1.3.4/source/treefile.c000077500000000000000000000320431315746145500162410ustar00rootroot00000000000000/* Sequence Generator - seq-gen, version 1.3.4 Copyright (c)1996-2017, Andrew Rambaut & Nick Grassly Institute of Evolutionary Biology, University of Edinburgh All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any feedback is very welcome. http://tree.bio.ed.ac.uk/software/seqgen/ email: a.rambaut@ed.ac.uk */ #include #include #include #include #include #include "global.h" #include "treefile.h" char treeErrorMsg[256]; int treeError; TNode *avail=NULL; long usedAvail=0; long usedMalloc=0; /* prototypes */ TNode *NewNode(TTree *tree); void InitTree(TTree *tree); void CheckCapacity(TTree *tree, int required); void DisposeNode(TNode *aNode); void DisposeTreeNodes(TTree *tree); void FreeNodes(void); char ReadToNextChar(FILE *fv); void ReadUntil(FILE *fv, char stopChar, char *what); TNode *ReadTip(FILE *fv, char ch, TTree *tree, int numNames, char **names); TNode *ReadNode(FILE *fv, TTree *tree, int numNames, char **names, int detectPolytomies); TNode *ReadBranch(FILE *fv, TTree *tree, int numNames, char **names); void WriteNode(FILE *fv, TTree *tree, TNode *node); /* functions */ /*----------*/ TNode *NewNode(TTree *tree) { TNode *node; if ( avail!=NULL ) { node=avail; avail=avail->next; usedAvail++; } else { if ( (node=malloc(sizeof(TNode)))==NULL ) { strcpy(treeErrorMsg, "Out of memory"); return NULL; } usedMalloc++; } node->branch0=node->branch1=node->branch2=NULL; node->length0=node->length1=node->length2=0.0; node->param=0.0; node->tipNo=-1; node->sequence = NULL; node->next=tree->nodeList; tree->nodeList=node; tree->numNodes++; return node; } /* NewNode */ /*----------*/ void InitTree(TTree *tree) { tree->root=NULL; tree->nodeList=NULL; tree->numNodes=0; tree->numTips=0; tree->totalLength=0.0; tree->rooted=0; tree->lengths=-1; } /* InitTree */ /*----------*/ TTree *NewTree() { TTree *tree; if ( (tree=(TTree *)malloc(sizeof(TTree)))==NULL ) { strcpy(treeErrorMsg, "Out of memory creating tree."); return NULL; } memset(tree, 0, sizeof(TTree)); /* grj */ tree->capacity=0; CheckCapacity(tree, 1000); InitTree(tree); return tree; } /* NewTree */ /*----------*/ void CheckCapacity(TTree *tree, int required) { int i; int newCapacity = tree->capacity; char **newNames; TNode **newTips; while (newCapacity < required) { newCapacity += 1000; } newNames = (char**)CAllocMem(sizeof(char*)*newCapacity, "newNames", "CheckCapacity", 0); newTips = (TNode**)CAllocMem(sizeof(TNode*)*newCapacity, "newTips", "CheckCapacity", 0); for (i = 0; i < tree->capacity; i++) { newNames[i] = tree->names[i]; newTips[i] = tree->tips[i]; } for (i = tree->capacity; i < newCapacity; i++) { newNames[i] = NULL; newTips[i] = NULL; } if (tree->names) { free(tree->names); tree->names=0; } /* grj */ if (tree->tips) { free(tree->tips); tree->tips = 0; } /* grj */ tree->names = newNames; tree->tips = newTips; tree->capacity = newCapacity; } /* CheckCapacity */ /*----------*/ void DisposeNode(TNode *aNode) { aNode->next=avail; avail=aNode; } /* DisposeNode */ /*----------*/ void DisposeTreeNodes(TTree *tree) { TNode *P, *O; if ( tree ) { P=tree->nodeList; while (P!=NULL) { O=P; P=P->next; DisposeNode(O); } tree->nodeList=NULL; } } /* DisposeTreeNodes */ /*----------*/ void DisposeTree(TTree *tree) { if (tree) { DisposeTreeNodes(tree); InitTree(tree); } } /* DisposeTree */ /*----------*/ void FreeNodes(void) { TNode *P, *O; P=avail; while (P!=NULL) { O=P; P=P->next; free(O); } } /* FreeNodes */ /*----------*/ void FreeTree(TTree *tree) { if (tree) { DisposeTreeNodes(tree); free(tree); } FreeNodes(); } /* FreeTree */ /*----------*/ void WriteAvailInfo() { TNode *P; int count; count=0; P=avail; while (P!=NULL) { P=P->next; count++; } fprintf(stderr, "Avail: %d nodes - availed: %ld, malloced: %ld\n", count, usedAvail, usedMalloc); } /* WriteAvailInfo */ int CountTrees(FILE *fv) { int n; if (fv==NULL) return 0; n=0; while (!feof(fv)) { if (fgetc(fv)==';') n++; } rewind(fv); return n; } char ReadToNextChar(FILE *fv) { char ch; ch=fgetc(fv); while (!feof(fv) && isspace(ch)) ch=fgetc(fv); return ch; } void ReadUntil(FILE *fv, char stopChar, char *what) { char ch; ch=fgetc(fv); while (!feof(fv) && ch!=stopChar && ch!='(' && ch!=',' && ch!=':' && ch!=')' && ch!=';') ch=fgetc(fv); if (feof(fv) || ch!=stopChar) { sprintf(treeErrorMsg, "%s missing", what); treeError=1; } } TNode *ReadTip(FILE *fv, char ch, TTree *tree, int numNames, char **names) { int i; char *P; char name[256]; TNode *node; node=NewNode(tree); i=0; P=name; while (!feof(fv) && ch!=':' && ch!=',' && ch!=')' && inumTips+1); if (numNames == 0) { node->tipNo=tree->numTips; if (tree->names[node->tipNo]==NULL) { if ( (tree->names[node->tipNo]=(char *)malloc(MAX_NAME_LEN+1))==NULL ) { strcpy(treeErrorMsg, "Out of memory creating name."); return NULL; } } strcpy(tree->names[node->tipNo], name); } else { /* we already have some names so just look it up...*/ i = 0; while (i < numNames && strcmp(name, names[i]) != 0) i++; if (i == numNames) { sprintf(treeErrorMsg, "Taxon names in trees for different partitions do not match."); return NULL; } node->tipNo=i; } tree->tips[node->tipNo]=node; tree->numTips++; while (!feof(fv) && ch!=':' && ch!=',' && ch!=')') ch=fgetc(fv); if (feof(fv)) { sprintf(treeErrorMsg, "Unexpected end of file"); return NULL; } ungetc(ch, fv); return node; } TNode *ReadNode(FILE *fv, TTree *tree, int numNames, char **names, int detectPolytomies) { TNode *node, *node2; char ch; if ((node=NewNode(tree))==NULL) return NULL; if ((node2=ReadBranch(fv, tree, numNames, names))==NULL) return NULL; node->branch1=node2; node2->branch0=node; node->length1=node2->length0; ReadUntil(fv, ',', "Comma"); if (treeError) return NULL; if ((node2=ReadBranch(fv, tree, numNames, names))==NULL) return NULL; node->branch2=node2; node2->branch0=node; node->length2=node2->length0; ch=fgetc(fv); while (!feof(fv) && ch!=':' && ch!=',' && ch!=')' && ch!=';') ch=fgetc(fv); if (detectPolytomies && ch==',') { fprintf(stderr, "This tree contains nodes which aren't bifurcations. Resolve the node\n"); fprintf(stderr, "with zero branch lengths to obtain correct results. This can be done\n"); fprintf(stderr, "with a program called TreeEdit: http://evolve.zoo.ox.ac.uk/software/TreeEdit\n"); exit(4); } if (feof(fv)) { sprintf(treeErrorMsg, "Unexpected end of file"); return NULL; } ungetc(ch, fv); return node; } TNode *ReadBranch(FILE *fv, TTree *tree, int numNames, char **names) { char ch; double len, param=0.0; TNode *node; ch=ReadToNextChar(fv); if (ch=='(') { /* is a node */ node=ReadNode(fv, tree, numNames, names, 1); ReadUntil(fv, ')', "Closing bracket"); if (treeError) return NULL; } else { /* is a tip */ node=ReadTip(fv, ch, tree, numNames, names); } ch=ReadToNextChar(fv); if (ch==':') { if (tree->lengths==0) { sprintf(treeErrorMsg, "Some branches don't have branch lengths"); return NULL; } else tree->lengths=1; if (fscanf(fv, "%lf", &len)!=1) { sprintf(treeErrorMsg, "Unable to read branch length"); return NULL; } ch=ReadToNextChar(fv); if (ch=='[') { if (fscanf(fv, "%lf", ¶m)!=1) { sprintf(treeErrorMsg, "Unable to read branch parameter"); return NULL; } ReadUntil(fv, ']', "Close square bracket"); } else ungetc(ch, fv); } else { if (tree->lengths==1) { sprintf(treeErrorMsg, "Some branches don't have branch lengths"); return NULL; } else tree->lengths=0; len=0.0; ungetc(ch, fv); } node->length0=len; node->param=param; return node; } void ReadTree(FILE *fv, TTree *tree, int treeNum, int numNames, char **names, int *outNumSites, double *outRelRate) { char ch; TNode *P; treeError=0; tree->numNodes=0; tree->numTips=0; tree->rooted=1; tree->lengths=-1; (*outRelRate) = 1.0; ch=fgetc(fv); while (!feof(fv) && ch!='(' && ch!='[') ch=fgetc(fv); if (ch == '[') { if (fscanf(fv, "%d", outNumSites)!=1) { sprintf(treeErrorMsg, "Unable to read partition length"); exit(4); } ch=fgetc(fv); while (!feof(fv) && ch!=',' && ch!='(') ch=fgetc(fv); if (ch == ',') { if (fscanf(fv, "%lf", outRelRate)!=1) { sprintf(treeErrorMsg, "Unable to read partition relative rate"); exit(4); } ch=fgetc(fv); while (!feof(fv) && ch!='(') ch=fgetc(fv); } } if (ch!='(' || (tree->root=ReadNode(fv, tree, numNames, names, 0))==NULL) { fprintf(stderr, "Error reading tree number %d: %s.\n", treeNum, treeErrorMsg); exit(4); } ch=fgetc(fv); while (!feof(fv) && ch!=',' && ch!=')' && ch!=';') ch=fgetc(fv); if (ch==',') { /* the tree is unrooted */ tree->rooted=0; if ((tree->root->branch0=ReadBranch(fv, tree, numNames, names))==NULL) { fprintf(stderr, "Error reading tree number %d: %s.\n", treeNum, treeErrorMsg); exit(4); } tree->root->branch0->branch0=tree->root; tree->root->length0=tree->root->branch0->length0; } tree->totalLength=0.0; if (tree->rooted) { P=tree->root; while (P!=NULL) { tree->totalLength+=P->length0; P=P->branch1; } } } int IsTreeAvail(FILE *fv) { char ch; ch=fgetc(fv); while (!feof(fv) && ch!='(' && ch!='[') ch=fgetc(fv); if (ch=='(' || ch=='[') ungetc(ch, fv); return (!feof(fv)); } void WriteNode(FILE *fv, TTree *tree, TNode *node) { if (node->tipNo==-1) { fputc('(', fv); WriteNode(fv, tree, node->branch1); fputc(',', fv); WriteNode(fv, tree, node->branch2); fputc(')', fv); } else { fprintf(fv, "%s", tree->names[node->tipNo]); } if (tree->lengths) fprintf(fv, ":%.6f", node->length0); } void WriteTree(FILE *fv, TTree *tree) { fputc('(', fv); if (tree->rooted) { WriteNode(fv, tree, tree->root->branch1); fputc(',', fv); WriteNode(fv, tree, tree->root->branch2); } else { WriteNode(fv, tree, tree->root->branch1); fputc(',', fv); WriteNode(fv, tree, tree->root->branch2); fputc(',', fv); WriteNode(fv, tree, tree->root->branch0); } fprintf(fv, ");\n"); } void UnrootRTree(TTree *tree) /* Used to unroot a rooted tree */ { TNode *P, *Q, *R, *newRoot; double len, len2; if (!tree->rooted || tree->numTips<3) return; P=tree->tips[0]; Q=P->branch0; newRoot=Q; while (Q!=tree->root) { R=Q->branch0; len=Q->length0; if (Q->branch1==P) { len2=Q->length1; Q->branch1=R; Q->length1=len; } else { len2=Q->length2; Q->branch2=R; Q->length2=len; } Q->branch0=P; Q->length0=len2; P=Q; Q=R; } len=R->length1+R->length2; if (R->branch1==P) Q=R->branch2; else Q=R->branch1; Q->branch0=P; Q->length0=len; if (P->branch1==R) { P->branch1=Q; P->length1=len; } else { P->branch2=Q; P->length2=len; } tree->root=newRoot; DisposeNode(R); tree->rooted=0; } void RerootUTree(TTree *tree, int tip) /* Used to reroot an unrooted tree */ /* This may sound strange but all it does is move the root trichotomy */ { TNode *P, *Q, *R, *newRoot; double len, len2; if (tree->rooted) return; P=tree->tips[tip]; Q=P->branch0; newRoot=Q; while (P!=tree->root) { R=Q->branch0; len=Q->length0; if (Q->branch1==P) { len2=Q->length1; Q->branch1=R; Q->length1=len; } else { len2=Q->length2; Q->branch2=R; Q->length2=len; } Q->branch0=P; Q->length0=len2; P=Q; Q=R; } tree->root=newRoot; } Seq-Gen-1.3.4/source/treefile.h000077500000000000000000000046601315746145500162520ustar00rootroot00000000000000/* Header file for treefile.c */ /* Sequence Generator - seq-gen, version 1.3.4 Copyright (c)1996-2017, Andrew Rambaut & Nick Grassly Institute of Evolutionary Biology, University of Edinburgh All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any feedback is very welcome. http://tree.bio.ed.ac.uk/software/seqgen/ email: a.rambaut@ed.ac.uk */ #ifndef _TREEFILE_H_ #define _TREEFILE_H_ /* This contains the structures, TTree and TNode which */ /* can be edited to add more elements */ #include "tree.h" TTree *NewTree(); void DisposeTree(TTree *tree); void FreeTree(TTree *tree); void WriteAvailInfo(); int CountTrees(FILE *fv); void ReadTree(FILE *fv, TTree *tree, int treeNum, int numNames, char **names, int *outNumSites, double *outRelRate); int IsTreeAvail(FILE *fv); void WriteTree(FILE *fv, TTree *tree); void UnrootRTree(TTree *tree); void RerootUTree(TTree *tree, int tip); #endif /* _TREEFILE_H_ */ Seq-Gen-1.3.4/source/twister.c000077500000000000000000000156271315746145500161540ustar00rootroot00000000000000/* Sequence Generator - seq-gen, version 1.3.4 Copyright (c)1996-2017, Andrew Rambaut & Nick Grassly Institute of Evolutionary Biology, University of Edinburgh The code in this file is covered by the license and copyright message given below. Any feedback is very welcome. http://tree.bio.ed.ac.uk/software/seqgen/ email: a.rambaut@ed.ac.uk */ /* A C-program for MT19937, with initialization improved 2002/1/26. Coded by Takuji Nishimura and Makoto Matsumoto. Before using, initialize the state by using init_genrand(seed) or init_by_array(init_key, key_length). Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura, All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any feedback is very welcome. http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html email: m-mat @ math.sci.hiroshima-u.ac.jp (remove space) */ #include #include #include #include "twister.h" /* Period parameters */ #define N 624 #define M 397 #define MATRIX_A 0x9908b0dfUL /* constant vector a */ #define UPPER_MASK 0x80000000UL /* most significant w-r bits */ #define LOWER_MASK 0x7fffffffUL /* least significant r bits */ static unsigned long mt[N]; /* the array for the state vector */ static int mti=N+1; /* mti==N+1 means mt[N] is not initialized */ /* initializes mt[N] with a seed */ void init_genrand(unsigned long s) { mt[0]= s & 0xffffffffUL; for (mti=1; mti> 30)) + mti); /* See Knuth TAOCP Vol2. 3rd Ed. P.106 for multiplier. */ /* In the previous versions, MSBs of the seed affect */ /* only MSBs of the array mt[]. */ /* 2002/01/09 modified by Makoto Matsumoto */ mt[mti] &= 0xffffffffUL; /* for >32 bit machines */ } } /* initialize by an array with array-length */ /* init_key is the array for initializing keys */ /* key_length is its length */ /* slight change for C++, 2004/2/26 */ void init_by_array(unsigned long init_key[], int key_length) { int i, j, k; init_genrand(19650218UL); i=1; j=0; k = (N>key_length ? N : key_length); for (; k; k--) { mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1664525UL)) + init_key[j] + j; /* non linear */ mt[i] &= 0xffffffffUL; /* for WORDSIZE > 32 machines */ i++; j++; if (i>=N) { mt[0] = mt[N-1]; i=1; } if (j>=key_length) j=0; } for (k=N-1; k; k--) { mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941UL)) - i; /* non linear */ mt[i] &= 0xffffffffUL; /* for WORDSIZE > 32 machines */ i++; if (i>=N) { mt[0] = mt[N-1]; i=1; } } mt[0] = 0x80000000UL; /* MSB is 1; assuring non-zero initial array */ } /* generates a random number on [0,0xffffffff]-interval */ unsigned long genrand_int32(void) { unsigned long y; static unsigned long mag01[2]={0x0UL, MATRIX_A}; /* mag01[x] = x * MATRIX_A for x=0,1 */ if (mti >= N) { /* generate N words at one time */ int kk; if (mti == N+1) /* if init_genrand() has not been called, */ init_genrand(5489UL); /* a default initial seed is used */ for (kk=0;kk> 1) ^ mag01[y & 0x1UL]; } for (;kk> 1) ^ mag01[y & 0x1UL]; } y = (mt[N-1]&UPPER_MASK)|(mt[0]&LOWER_MASK); mt[N-1] = mt[M-1] ^ (y >> 1) ^ mag01[y & 0x1UL]; mti = 0; } y = mt[mti++]; /* Tempering */ y ^= (y >> 11); y ^= (y << 7) & 0x9d2c5680UL; y ^= (y << 15) & 0xefc60000UL; y ^= (y >> 18); return y; } /* generates a random number on [0,0x7fffffff]-interval */ long genrand_int31(void) { return (long)(genrand_int32()>>1); } /* generates a random number on [0,1]-real-interval */ double genrand_real1(void) { return genrand_int32()*(1.0/4294967295.0); /* divided by 2^32-1 */ } /* generates a random number on [0,1)-real-interval */ double genrand_real2(void) { return genrand_int32()*(1.0/4294967296.0); /* divided by 2^32 */ } /* generates a random number on (0,1)-real-interval */ double genrand_real3(void) { return (((double)genrand_int32()) + 0.5)*(1.0/4294967296.0); /* divided by 2^32 */ } /* generates a random number on [0,1) with 53-bit resolution*/ double genrand_res53(void) { unsigned long a=genrand_int32()>>5, b=genrand_int32()>>6; return(a*67108864.0+b)*(1.0/9007199254740992.0); } /* These real versions are due to Isaku Wada, 2002/01/09 added */ void SetSeed(unsigned long seed) { init_genrand(seed); } unsigned long CreateSeed( ) { static unsigned long differ = 0; // guarantee time-based seeds will change // Get a uint32 from t and c // Better than uint32(x) in case x is floating point in [0,1] // Based on code by Lawrence Kirby (fred@genesis.demon.co.uk) time_t t = time(NULL); clock_t c = clock(); unsigned long h1 = 0; unsigned long h2 = 0; unsigned char *p = (unsigned char *) &t; size_t i, j; for( i = 0; i < sizeof(t); ++i ) { h1 *= UCHAR_MAX + 2U; h1 += p[i]; } p = (unsigned char *) &c; for( j = 0; j < sizeof(c); ++j ) { h2 *= UCHAR_MAX + 2U; h2 += p[j]; } return ( h1 + differ++ ) ^ h2; } double rndu() { return genrand_real1(); } Seq-Gen-1.3.4/source/twister.h000077500000000000000000000037711315746145500161560ustar00rootroot00000000000000/* Header file for twister.c */ /* Sequence Generator - seq-gen, version 1.3.4 Copyright (c)1996-2017, Andrew Rambaut & Nick Grassly Institute of Evolutionary Biology, University of Edinburgh All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any feedback is very welcome. http://tree.bio.ed.ac.uk/software/seqgen/ email: a.rambaut@ed.ac.uk */ #ifndef _TWISTER_H_ #define _TWISTER_H_ void SetSeed (unsigned long seed); unsigned long CreateSeed(void); double rndu (void); #endif /* _TWISTER_H_ */